I encountered the following issue with version 1.3.1.
When handling a DTD event and I try to access the name of the DTD root element via AsyncXMLStreamReader, the root name is only returned in full when it consists of a multiple of 4 bytes, otherwise characters are cut off.
The issue can be reproduced with the following code:
`
public class Test {
private static final String XML =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n"
+ "<!DOCTYPE AnyRootName SYSTEM \"somedtd.dtd\">\r\n"
+ "<data>somedata</data>";
public static void main(final String[] args) throws XMLStreamException {
final AsyncXMLStreamReader<AsyncByteArrayFeeder> reader = new InputFactoryImpl().createAsyncFor(XML.getBytes(StandardCharsets.UTF_8));
while (reader.hasNext() && ! reader.isEndElement()) {
if (reader.next() == XMLStreamConstants.DTD) {
System.out.println(reader.getDTDInfo().getDTDRootName());
}
}
}
}
`
In this case "AnyRootName" would be expected but what's actually returned from getDTDRootName() is "AnyRootN" with the last 3 characters missing, "SomeRootName" is returned in full.
While playing around a bit I also found out that if the name of the root element ends with a multi-byte character that exceeds the last four byte boundary (e.g. by changing "AnyRootName" to "AnyRootNameä"), this breaks parsing completely
Exception in thread "main" com.fasterxml.aalto.WFCException: Unexpected end-of-input in name (parsing [257])
at [row,col {unknown-source}]: [2,24]
at com.fasterxml.aalto.in.XmlScanner.reportInputProblem(XmlScanner.java:1333)
at com.fasterxml.aalto.in.XmlScanner.reportEofInName(XmlScanner.java:1430)
at com.fasterxml.aalto.in.ByteBasedScanner.addUTFPName(ByteBasedScanner.java:371)
at com.fasterxml.aalto.async.AsyncByteScanner.addPName(AsyncByteScanner.java:687)
at com.fasterxml.aalto.async.AsyncByteScanner.findPName(AsyncByteScanner.java:678)
at com.fasterxml.aalto.async.AsyncByteArrayScanner.parsePName(AsyncByteArrayScanner.java:3727)
at com.fasterxml.aalto.async.AsyncByteScanner.handleDTD(AsyncByteScanner.java:1411)
at com.fasterxml.aalto.async.AsyncByteScanner.handlePrologDeclStart(AsyncByteScanner.java:940)
at com.fasterxml.aalto.async.AsyncByteScanner.nextFromProlog(AsyncByteScanner.java:877)
at com.fasterxml.aalto.stax.StreamReaderImpl.next(StreamReaderImpl.java:790)
at mypackage.Test.main(Test.java:29)
whereas "AnyRootNäme" works and returns the whole name.
I encountered the following issue with version 1.3.1. When handling a DTD event and I try to access the name of the DTD root element via AsyncXMLStreamReader, the root name is only returned in full when it consists of a multiple of 4 bytes, otherwise characters are cut off.
The issue can be reproduced with the following code: `
} ` In this case "AnyRootName" would be expected but what's actually returned from getDTDRootName() is "AnyRootN" with the last 3 characters missing, "SomeRootName" is returned in full.
While playing around a bit I also found out that if the name of the root element ends with a multi-byte character that exceeds the last four byte boundary (e.g. by changing "AnyRootName" to "AnyRootNameä"), this breaks parsing completely
whereas "AnyRootNäme" works and returns the whole name.