FasterXML / woodstox

The gold standard Stax XML API implementation. Now at Github.
Apache License 2.0
220 stars 81 forks source link

Notations declared in external DTD subsets are reported as undefined #184

Open nektarios-kitsios opened 7 months ago

nektarios-kitsios commented 7 months ago

Validation fails with the error below when a NOTATION that is declared in an external DTD subset is referenced in an internal DTD subset:

om.ctc.wstx.exc.WstxValidationException:
1 referenced notation undefined: first one 'IMAGE'
 at [row,col {unknown-source}]: [1,109]
        at com.ctc.wstx.exc.WstxValidationException.create(WstxValidationException.java:50)
        at com.ctc.wstx.sr.StreamScanner.reportValidationProblem(StreamScanner.java:593)
        at com.ctc.wstx.sr.StreamScanner.reportValidationProblem(StreamScanner.java:601)
        at com.ctc.wstx.dtd.FullDTDReader._reportVCViolation(FullDTDReader.java:1989)
        at com.ctc.wstx.dtd.FullDTDReader._reportUndefinedNotationRefs(FullDTDReader.java:1967)
        at com.ctc.wstx.dtd.FullDTDReader.parseDTD(FullDTDReader.java:639)
        at com.ctc.wstx.dtd.FullDTDReader.readInternalSubset(FullDTDReader.java:427)
        at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:277)
        at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3466)
        at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2089)
        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1180)
        at org.codehaus.stax.test.vstream.TestExternalSubset.testNotationReferenceInInternalSubset(TestExternalSubset.java:77)

The following example demonstrates this problem:

XML:

<!DOCTYPE root SYSTEM 'test.dtd' [ <!ENTITY gr2 SYSTEM "gr2" NDATA IMAGE> ]>
<root>&extEnt;</root> 

DTD:

<!ELEMENT root (#PCDATA)>
<!ENTITY extEnt 'just testing'>
<!NOTATION IMAGE       PUBLIC "-//AS//NOTATION image format//EN"
                     "http://www.test.com/xml/common/dtd/notation/image">

Note that Xerces and xmllint validate the above XML successfully. This is in accordance with the XML specification which does not impose any specific order in the declarations. The only requirement is the following:

If both the external and internal subsets are used, the internal subset must be considered to occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.

The problem seems to be that woodstox reads the internal subset before the external one, and reports the error at the time the internal subset is read when the notation declaration has not been read yet.

For the record there used to be an old issue reporting this problem here: https://web.archive.org/web/20150507153747/http://jira.codehaus.org/browse/WSTX-264

nektarios-kitsios commented 7 months ago

Attached is a patch with a unit test that demonstrates this case and a potential fix: Issue184-_potential_fix_and_unit_test.patch.txt

cowtowncoder commented 7 months ago

Thank you for reporting this issue. Interesting that this was not covered by the extensive XMLTest suite (I think maybe https://www.w3.org/XML/Test/ but it has been years) that had ~2000 tests, many focusing on DTD aspects.

I hope to find time to at least get test integrated and then have a look at proposed solution.

nektarios-kitsios commented 7 months ago

Attached is a new version of the patch that fixes a problem with parameter entity overrides (demonstrated in a unit test). To solve this the external subset is read twice, but there may be better ways to fix this.

Issue184-_potential_fix_and_unit_test_v2.patch.txt

cowtowncoder commented 6 months ago

@nektarios-kitsios Apologies for slow follow up here. Would it be possible to split patch in two parts: one for unit tests, another for actual fix? I should be able to merge tests first (and move under failing until fixed), and could see how fix itself would work.

nektarios-kitsios commented 5 months ago

As requested, split the patch in two parts:

Issue184-_unit_tests.patch.txt Issue184-_potential_fix.patch.txt

cowtowncoder commented 5 months ago

Thank you!