FasterXML / woodstox

The gold standard Stax XML API implementation. Now at Github.
Apache License 2.0
225 stars 81 forks source link

WstxValidationException: Unknown reason (at end element </nl:nillableIntElement>) when validating a document with nillable elements #179

Open ppalaga opened 11 months ago

ppalaga commented 11 months ago

This issue occurs when CXF is validating an incoming SOAP message against W3CSchema and the message contains a an xs:dateTime or xs:int elements having xsi:nil="true":

2023-10-28 22:40:16,639 WARN  [org.apa.cxf.pha.PhaseInterceptorChain] (executor-thread-0) Interceptor for {http://nl.schiphol.asb.services/SPLDataService}SPLDataService#{http://nl.schiphol.asb.services/SPLDataService}AddSPLData has thrown exception, unwinding now: org.apache.cxf.interceptor.Fault: Unknown reason (at end element </spl:intElem>)
        at org.apache.cxf.staxutils.validation.Stax2ValidationUtils$2.reportProblem(Stax2ValidationUtils.java:147)
        at com.ctc.wstx.sw.BaseStreamWriter.reportProblem(BaseStreamWriter.java:1230)
        at com.ctc.wstx.msv.GenericMsvValidator.reportError(GenericMsvValidator.java:562)
        at com.ctc.wstx.msv.GenericMsvValidator.reportError(GenericMsvValidator.java:554)
        at com.ctc.wstx.msv.GenericMsvValidator.reportError(GenericMsvValidator.java:548)
        at com.ctc.wstx.msv.GenericMsvValidator.validateElementEnd(GenericMsvValidator.java:390)
        at com.ctc.wstx.sw.BaseNsStreamWriter.doWriteEndTag(BaseNsStreamWriter.java:713)
        at com.ctc.wstx.sw.BaseNsStreamWriter.writeEndElement(BaseNsStreamWriter.java:285)
        at org.apache.cxf.staxutils.StaxUtils.copy(StaxUtils.java:745)
        at org.apache.cxf.staxutils.StaxUtils.copy(StaxUtils.java:707)
        at org.apache.cxf.databinding.source.XMLStreamDataReader.validate(XMLStreamDataReader.java:244)
        at org.apache.cxf.databinding.source.XMLStreamDataReader.read(XMLStreamDataReader.java:115)
        at org.apache.cxf.databinding.source.XMLStreamDataReader.read(XMLStreamDataReader.java:83)
        at org.apache.cxf.databinding.source.XMLStreamDataReader.read(XMLStreamDataReader.java:67)
        at org.apache.cxf.wsdl.interceptors.DocLiteralInInterceptor.handleMessage(DocLiteralInInterceptor.java:192)
        at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
        at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
        at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
        at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:234)
        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:208)
        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:160)
        at io.quarkiverse.cxf.transport.CxfHandler.process(CxfHandler.java:288)
        at io.quarkiverse.cxf.transport.CxfHandler.handle(CxfHandler.java:225)
        at io.quarkiverse.cxf.transport.CxfHandler.handle(CxfHandler.java:46)
        at io.vertx.ext.web.impl.BlockingHandlerDecorator.lambda$handle$0(BlockingHandlerDecorator.java:48)
        at io.vertx.core.impl.ContextBase.lambda$executeBlocking$0(ContextBase.java:137)
        at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264)
        at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135)
        at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
        at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
        at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1478)
        at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
        at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:840)

CXF calls validateAgainst(schema) on a writer which gets the events copied from the reader.

A plain Woodstox reproducer is available in PR https://github.com/FasterXML/woodstox/pull/180 . It also makes sure that the failing XML documents are accepted by javax.xml.validation.

cowtowncoder commented 11 months ago

Failing test merged; interesting that validation works for string value but fails for other scalars (int, date-time).

And apparently only for writer-side. I wonder if typed write methods are missing some callbacks...

Oh wait. No, can't be since this is just read/write'ing content, not using typed API.

cowtowncoder commented 11 months ago

Hmmh. Not super easy to figure out. But the difference wrt xs:string is in GenericMsvValidator.validateElementAndAttributes():

int stringChecks = mCurrAcceptor.getStringCareLevel();

where xs:string gets Acceptor.STRING_IGNORE (i.e. anything goes) but others Acceptor.STRING_STRICT.

cowtowncoder commented 11 months ago

Ok, can't quite figure it out, but I assume sequence of validation calls in this case is incorrect for writer side; some call missing (or extra call).

At least there is reproduction if I have time to go back digging or (ideally) someone else has itch and does spelunking into code :) (would be happy to merge a PR for fix)

ppalaga commented 11 months ago

Thanks for having a look @cowtowncoder!

assume sequence of validation calls in this case is incorrect for writer side; some call missing (or extra call).

Something like that would also be my hypothesis: The code perhaps first checks whether the content is an int before it checks whether xsi:nil is true? It does not matter for strings because those can have zero length.

cowtowncoder commented 10 months ago

Thanks for having a look @cowtowncoder!

assume sequence of validation calls in this case is incorrect for writer side; some call missing (or extra call).

Something like that would also be my hypothesis: The code perhaps first checks whether the content is an int before it checks whether xsi:nil is true? It does not matter for strings because those can have zero length.

All actual validation state handling is done by MSV; Woodstox side just calls various methods to "feed" content.

As per my other note one observable difference is that xs:string can have exactly any content so MSV does not check anything.

ppalaga commented 9 months ago

I think I figured out, what is the problem.

I traced the method calls using AspectJ for both the reader and writer validation - here is the code: https://github.com/ppalaga/woodstox/commits/i179-trace/

To produce the traces you need to do the following:

mvn clean test -Dtest=TestW3CSchemaNillable179#testNillableIntReader > reader-trace.txt

mvn clean test -Dtest=TestW3CSchemaNillable179#testNillableInt > writer-trace.txt

I am attaching the traces:

In those the following parts are interesting:

In the reader scenario, reading the nil attribute returns true as expected:

    > com.ctc.wstx.msv.GenericMsvValidator.validateElementStart("nillableInt":java.lang.String, "http://server.hw.demo/nillable":java.lang.String, "nl":java.lang.String)
     ...
     > com.ctc.wstx.msv.AttributeProxy.getValue("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String)
      > com.ctc.wstx.sr.InputElementStack.getAttributeValue("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String)
       > com.ctc.wstx.sr.InputElementStack.findAttributeIndex("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String)
        > com.ctc.wstx.sr.AttributeCollector.findIndex("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String)
         > com.ctc.wstx.sr.Attribute.hasQName("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String)
         < com.ctc.wstx.sr.Attribute.hasQName("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String) : true:boolean
        < com.ctc.wstx.sr.AttributeCollector.findIndex("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String) : 0:int
       < com.ctc.wstx.sr.InputElementStack.findAttributeIndex("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String) : 0:int
       > com.ctc.wstx.sr.InputElementStack.getAttributeValue(0:int)
        > com.ctc.wstx.sr.InputElementStack.getAttrCollector()
        < com.ctc.wstx.sr.InputElementStack.getAttrCollector() : com.ctc.wstx.sr.AttributeCollector
        > com.ctc.wstx.sr.AttributeCollector.getValue(0:int)
         > com.ctc.wstx.util.TextBuilder.getAllValues()
         < com.ctc.wstx.util.TextBuilder.getAllValues() : "true":java.lang.String
         > com.ctc.wstx.sr.Attribute.getValue("true":java.lang.String)
         < com.ctc.wstx.sr.Attribute.getValue("true":java.lang.String) : "true":java.lang.String

but it is a bit different in the writer scenario:

    > com.ctc.wstx.msv.GenericMsvValidator.validateElementStart("nillableInt":java.lang.String, "http://server.hw.demo/nillable":java.lang.String, "nl":java.lang.String)
     ...
     > com.ctc.wstx.msv.AttributeProxy.getValue("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String)
      > com.ctc.wstx.sw.BaseStreamWriter.getAttributeValue("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String)
      < com.ctc.wstx.sw.BaseStreamWriter.getAttributeValue("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String) : null:java.lang.String
     < com.ctc.wstx.msv.AttributeProxy.getValue("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String) : null:java.lang.String

I was wondering why BaseStreamWriter.getAttributeValue("http://www.w3.org/2001/XMLSchema-instance":java.lang.String, "nil":java.lang.String) returns null. Here is why:

https://github.com/FasterXML/woodstox/blob/8029523187318aa3b98d8cc8e6db90e73df6c6bf/src/main/java/com/ctc/wstx/sw/BaseStreamWriter.java#L1280-L1304

It always returns null, because it is not implemented.

ppalaga commented 9 months ago

@cowtowncoder, could you please suggest some way forward?

cowtowncoder commented 9 months ago

Great detective work @ppalaga !

I wish I remembered the context here, but from what you include I think it is what it say: on writer-side, attribute info (wrt currently written element) is not accessible. Likely because it is not being retained. And if so, it'd need to be kept track of the case where there is validator to call (otherwise it is additional overhead with no benefit).

ppalaga commented 9 months ago

Thanks, sounds good. The main questions from my side are: (1) is it worth implementing (2) any chance that somebody from the project does that?

cowtowncoder commented 9 months ago

@ppalaga Right now I can't promise to work on this, but I always try to make time to help with PRs if someone else does. If that helps at all?

Fixing this would definitely be useful, for writer-side validation.

ppalaga commented 8 months ago

Thanks for the statement, @cowtowncoder.

Could you please give some hints about the design of writer validation? Especially, the following topics would be interesting:

ppalaga commented 8 months ago
  • Do we need Typed and Validating subclasses of BaseStreamWriter as they exist for BasicStreamReader?

I see there is TypedStreamWriter, but where under its subclasses Simple/Repairing should the new Validating writer be hanged?

cowtowncoder commented 8 months ago

No, should not need new sub-classes as typed writes and pluggable validation already exist on writer side.

Validating reader tests are in packages ending with .vstream (v for validating), split in 3 ones I think:

but there are also Validating Write tests in:

cowtowncoder commented 8 months ago

One confusing aspect of naming is that there are, I think, 2 hierarchies of -Writers:

Validation seems to be mostly (but maybe not completely) handled by StreamWriter implementations; actual validation is mValidator, and it is being called by stream writer based on write calls to essentially maintain and check validator state in parallel with actual token stream writes.

So: I do not think new classes are needed; most other validation functionality exists. But I do not remember what my thinking was with missing state keeping for attributes -- or, possibly just wiring access to state that might be kept. It is possible state is not kept, however, since whereas on StreamReader side there is natural need to access attribute values, on writer side there isn't (they are just to be output and XMLStreamWriter has no accessors).