XML is a bit of a poison pill - if you do it according to the spec, you end up with all kinds of security headaches. There are means of mitigating many of these, but one area that might be useful in cutting out huge chunks of potential risk is simply ignoring embedded DOCTYPE directives and forcing the DTD to be specified by the caller (if required).
The documentation for load_structure says that (emphasis mine):
The Options list controls the conversion process. Currently defined options are below. Other options are passed to sgml_parse/2.
...
dtd(?DTD)
Reference to a DTD object. If specified, the <!DOCTYPE ...> declaration is ignored and the document is parsed and validated against the provided DTD. If provided as a variable, the created DTD is returned. See section 3.5.
However, trying to load the following with this query: new_dtd(foo, DTD), load_structure('path to file>, S, [dtd(DTD)]). gives me this binding: S = [element(foo, [], [lollollollollollollollollollol])] which shows the DOCTYPE isn't being ignored.
XML is a bit of a poison pill - if you do it according to the spec, you end up with all kinds of security headaches. There are means of mitigating many of these, but one area that might be useful in cutting out huge chunks of potential risk is simply ignoring embedded
DOCTYPE
directives and forcing the DTD to be specified by the caller (if required).The documentation for
load_structure
says that (emphasis mine):However, trying to load the following with this query:
new_dtd(foo, DTD), load_structure('path to file>, S, [dtd(DTD)]).
gives me this binding:S = [element(foo, [], [lollollollollollollollollollol])]
which shows the DOCTYPE isn't being ignored.I'll provide some code to definitively turn off
DOCTYPE
processing, but default it to the existing behaviour for backward compatibility