FasterXML / woodstox

The gold standard Stax XML API implementation. Now at Github.
Apache License 2.0
220 stars 81 forks source link

The character ">" is not escaped in attribute values #207

Closed gkovacs20-cxn closed 1 month ago

gkovacs20-cxn commented 1 month ago

If the value of an attribute contains >, it is not escaped as &gt;, but rather printed as-is, resulting in an invalid XML. < is correctly escaped.

Tested with 6.6.2.

import java.io.StringWriter;
import javax.xml.stream.*;
import com.ctc.wstx.stax.WstxOutputFactory;

public class Main {
    public static void main(String[] args) throws XMLStreamException {
        test(XMLOutputFactory.newDefaultFactory()); // prints <elem attr="&lt;&gt;"></elem>
        test(new WstxOutputFactory());              // prints <elem attr="&lt;>"/>
    }

    private static void test(XMLOutputFactory factory) throws XMLStreamException {
        StringWriter writer = new StringWriter();
        XMLStreamWriter xmlWriter = factory.createXMLStreamWriter(writer);
        xmlWriter.writeStartElement("elem");
        xmlWriter.writeAttribute("attr", "<>");
        xmlWriter.writeEndElement();
        xmlWriter.close();
        System.out.println(writer);
    }
}
cowtowncoder commented 1 month ago

Yes? ">" need not be escaped; not mandated by XML specification, with a single exception (I think as part of "]]>"). Some libraries just escape all instance; Woodstox does not. It's not a bug but feature (minimizing escaping).

See f.ex https://stackoverflow.com/questions/1091945/what-characters-do-i-need-to-escape-in-xml-documents

gkovacs20-cxn commented 1 month ago

Can this be configured or is this a strict behavior?

cowtowncoder commented 1 month ago

This can be configured, although it is not as easy as I'd hope let me see if I can find references....

cowtowncoder commented 1 month ago

Ok so

src/test/java/org/codehaus/stax/test/wstream/CharacterEscapingTest.java

has an example using XMLOutputFactory2.P_TEXT_ESCAPER configuration, where you need to implement EscapingWriterFactory.

https://stackoverflow.com/questions/2783758/xmlstreamwriter-writecharacters-without-escaping

has some more info; and https://cowtowncoder.medium.com/configuring-woodstox-xml-parser-stax2-properties-c80ef5a32ef1 has full set of configuration settings.

So I hope some of that is helpful.

gkovacs20-cxn commented 1 month ago

Thank you for detailed answer!