FasterXML / aalto-xml

Ultra-high performance non-blocking XML processor (Stax API + extensions)
Apache License 2.0
288 stars 70 forks source link

DTD RootName only returned correctly if it consists of a multiple of 4 bytes #79

Open Huxlyx opened 1 year ago

Huxlyx commented 1 year ago

I encountered the following issue with version 1.3.1. When handling a DTD event and I try to access the name of the DTD root element via AsyncXMLStreamReader, the root name is only returned in full when it consists of a multiple of 4 bytes, otherwise characters are cut off.

The issue can be reproduced with the following code: `

public class Test {

private static final String XML =
          "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n"
        + "<!DOCTYPE AnyRootName SYSTEM \"somedtd.dtd\">\r\n"
        + "<data>somedata</data>";

public static void main(final String[] args) throws XMLStreamException {
    final AsyncXMLStreamReader<AsyncByteArrayFeeder> reader = new InputFactoryImpl().createAsyncFor(XML.getBytes(StandardCharsets.UTF_8));
    while (reader.hasNext() && ! reader.isEndElement()) {
        if (reader.next() == XMLStreamConstants.DTD) {
            System.out.println(reader.getDTDInfo().getDTDRootName());
        }
    }
}

} ` In this case "AnyRootName" would be expected but what's actually returned from getDTDRootName() is "AnyRootN" with the last 3 characters missing, "SomeRootName" is returned in full.

While playing around a bit I also found out that if the name of the root element ends with a multi-byte character that exceeds the last four byte boundary (e.g. by changing "AnyRootName" to "AnyRootNameä"), this breaks parsing completely

Exception in thread "main" com.fasterxml.aalto.WFCException: Unexpected end-of-input in name (parsing [257]) at [row,col {unknown-source}]: [2,24] at com.fasterxml.aalto.in.XmlScanner.reportInputProblem(XmlScanner.java:1333) at com.fasterxml.aalto.in.XmlScanner.reportEofInName(XmlScanner.java:1430) at com.fasterxml.aalto.in.ByteBasedScanner.addUTFPName(ByteBasedScanner.java:371) at com.fasterxml.aalto.async.AsyncByteScanner.addPName(AsyncByteScanner.java:687) at com.fasterxml.aalto.async.AsyncByteScanner.findPName(AsyncByteScanner.java:678) at com.fasterxml.aalto.async.AsyncByteArrayScanner.parsePName(AsyncByteArrayScanner.java:3727) at com.fasterxml.aalto.async.AsyncByteScanner.handleDTD(AsyncByteScanner.java:1411) at com.fasterxml.aalto.async.AsyncByteScanner.handlePrologDeclStart(AsyncByteScanner.java:940) at com.fasterxml.aalto.async.AsyncByteScanner.nextFromProlog(AsyncByteScanner.java:877) at com.fasterxml.aalto.stax.StreamReaderImpl.next(StreamReaderImpl.java:790) at mypackage.Test.main(Test.java:29)

whereas "AnyRootNäme" works and returns the whole name.

meiMingle commented 1 year ago

The 1.3.2 version has the same problem