FasterXML / woodstox

The gold standard Stax XML API implementation. Now at Github.
Apache License 2.0
220 stars 81 forks source link

The character ">" is not escaped in attribute values #207

Closed gkovacs20-cxn closed 1 month ago

gkovacs20-cxn commented 1 month ago

If the value of an attribute contains >, it is not escaped as &gt;, but rather printed as-is, resulting in an invalid XML. < is correctly escaped.

Tested with 6.6.2.

import com.ctc.wstx.stax.WstxOutputFactory;

public class Main {
    public static void main(String[] args) throws XMLStreamException {
        test(XMLOutputFactory.newDefaultFactory()); // prints <elem attr="&lt;&gt;"></elem>
        test(new WstxOutputFactory());              // prints <elem attr="&lt;>"/>

    private static void test(XMLOutputFactory factory) throws XMLStreamException {
        StringWriter writer = new StringWriter();
        XMLStreamWriter xmlWriter = factory.createXMLStreamWriter(writer);
        xmlWriter.writeAttribute("attr", "<>");
cowtowncoder commented 1 month ago

Yes? ">" need not be escaped; not mandated by XML specification, with a single exception (I think as part of "]]>"). Some libraries just escape all instance; Woodstox does not. It's not a bug but feature (minimizing escaping).

See f.ex

gkovacs20-cxn commented 1 month ago

Can this be configured or is this a strict behavior?

cowtowncoder commented 1 month ago

This can be configured, although it is not as easy as I'd hope let me see if I can find references....

cowtowncoder commented 1 month ago

Ok so


has an example using XMLOutputFactory2.P_TEXT_ESCAPER configuration, where you need to implement EscapingWriterFactory.

has some more info; and has full set of configuration settings.

So I hope some of that is helpful.

gkovacs20-cxn commented 1 month ago

Thank you for detailed answer!