FasterXML / aalto-xml

Ultra-high performance non-blocking XML processor (Stax API + extensions)
Apache License 2.0
292 stars 71 forks source link

Multi-byte characters are split in writeCData() if first byte sits right at the end of the buffer #86

Closed tatsel closed 5 months ago

tatsel commented 5 months ago

This issue seems to be similar to the one fixed here https://github.com/FasterXML/aalto-xml/pull/75/commits, though the cause is a bit different:

I get 'javax.xml.stream.XMLStreamException: Incomplete surrogate pair in content: first char 0xdfce, second 0x78' exception when I try to write CData with multi-byte char sitting right at the border of 512-sized internal buffer.

Example test to reproduce (copied from https://github.com/FasterXML/aalto-xml/blob/master/src/test/java/com/fasterxml/aalto/sax/TestSaxWriter.java#L10 and slightly adjusted for writeCData()):

StringBuilder testText = new StringBuilder();
        for (int i = 0; i < 511; i++)
            testText.append('x');
        testText.append("\uD835\uDFCE");
        for (int i = 0; i < 512; i++)
            testText.append('x');
        WriterConfig writerConfig = new WriterConfig();
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        Utf8XmlWriter writer = new Utf8XmlWriter(writerConfig, byteArrayOutputStream);
        writer.writeStartTagStart(writer.constructName("testelement"));
        writer.writeCData(testText.toString());
        writer.writeStartTagEnd();
        writer.writeEndTag(writer.constructName("testelement"));
        writer.close(false);

I think the reason is that ByteXmlWriter#writeCDataContents() lacks this piece of code which exists in writeCharacters():

if (_surrogate != 0) {
            outputSurrogates(_surrogate, cbuf[offset]);
//           reset the temporary surrogate storage
            _surrogate = 0;
            ++offset;
            --len;
        }
tatsel commented 5 months ago

I added the mentioned test and potential fix in this pull-request https://github.com/FasterXML/aalto-xml/pull/87, please check if this is the correct way to address the issue.

cowtowncoder commented 5 months ago

Thank you for both reporting the issue and providing both reproduction and fix! I decided to check couple of other cases as well and similar issue affected comments (#91) and processing instructions (#93) too, so fixed those as well.

I can release 1.3.3 soon, but wanted to give you a chance to see if I missed anything with fix, before publishing.

tatsel commented 5 months ago

Thank you for the quick response! I'm glad I helped to find this. There are no more comments from my end, will wait for the release.