RDFLib / pyrdfa3

RDFa 1.1 distiller/parser library: can extract RDFa 1.1 (and RDFa 1.0, if properly set via a @version attribute) from (X)HTML, SVG, or XML in general. The module can be used to produce serialized versions of the extracted graph, or simply an RDFLib Graph.
http://www.w3.org/2012/pyRdfa/
Other
67 stars 22 forks source link

how hard would it be to extend pyRdfa3 to lxml.etree? #29

Open doriantaylor opened 5 years ago

doriantaylor commented 5 years ago

Hey there,

Just tried to feed graph_from_DOM an already-parsed lxml.etree document and I tripped over the fact that it only speaks xml.dom.minidom. Since both these APIs give access to roughly the same information (at least as far as RDFa is concerned), I'd be okay with trying to make it handle both—unless it was too much of a snarl, or you didn't want it to for some reason.

Thoughts?

iherman commented 5 years ago

@doriantaylor to be honest, I have no idea. This code is fairly old; when I began its first version (must be way more than 10 years ago…), minidom was in the tool for xml and, following the adage "ain't broken, don't fix it" I never really changed it. I cannot judge the difficulty.

One potential issue may be (but again it may not be…) whether there is a clear compatibility in the interface between the minidom used when parsing a pure XML content (say, an SVG file) and what is produced via the html5parser. I would be surprised if there was a difference, but this must be checked. Obviously, html5parser (which is an external dependency) plays an essential role.

I do not have any objection at all if you try. Mind you, this library is behind the RDFa distiller and parser service at W3C (which has a decent usage), so there has to be extra care in adopting any change…

doriantaylor commented 5 years ago

It looks like html5lib has an option to construct output with lxml.etree, however my reading of graph_from_DOM is that it's farther down the pipeline than that. One might be able to get away with a small proxy class that does a partial implementation:

I will take a look at what this entails. Maybe somebody has done it already?