crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.36k stars 1.62k forks source link

Escaping special characters with XML::Builder broken broken in document fragment #9301

Open straight-shoota opened 4 years ago

straight-shoota commented 4 years ago

When using XML::Builder, the #text method is supposed to insert text content and make sure it's properly escaped to not allow injecting content that could be interpreted as tags. This generally works as expected, but there's a weird special case when text is added at the top scope of a document fragment, i.e. not wrapped in an element.

# This is ok:
XML.build_fragment {|b| b.element("p") { b.text("<foo>") } } # => "<p>&lt;foo&gt;</p>\n"
# This is NOT ok:
XML.build_fragment {|b| b.text("<foo>") } # => "<foo>\n"

The reason for this is in libxml2's xmlTextWriterWriteString method which for some reason seems to not escape the string if there's no entry on the writer stack:

https://github.com/GNOME/libxml2/blob/bfc0f674ccf5c3929ce22e29307a6ffae17428ad/xmlwriter.c#L1491-L1524

I have no idea what's the reason for this and I can't fidn any further documentation nor bug reports for this. But it's really hard to find such things for libxml2. Maybe I should open an issue at https://gitlab.gnome.org/GNOME/libxml2/-/issues but I can't imagine this hasn't come up before, so there must be some information, I just can't find it. Maybe someone here has an idea?

The behaviour of our exposed API is certainly unexpected and a potential security issue because text should be expected to always escape, no matter the builder state. We don't even expose a #raw method yet which could be used for explicitly escaped content (for example using HTML.escape).

naqvis commented 4 years ago

IMO Crystal API XML::Builder shouldn't be exposing text, attribute, attributes etc at Document level, as these are element level things and should only be made available to element (first part of your example above).

So this isn't a problem with the underlying library, but how we have encapsulated that into API.

straight-shoota commented 3 years ago

The XML::Builder interface is agnostic to this. There's no way to statically restrict the methods you can call on it. That depends on the current context, which is a runtime property.

What you describe is actually a different problem and not really related to this issue. This is about an entirely valid represenation of a DOM (a document fragment containing a text node) being improperly created. This is not a failure of the API (which is completely fine as per my example above) but the implementation is flawed.