RDFLib / pyrdfa3

RDFa 1.1 distiller/parser library: can extract RDFa 1.1 (and RDFa 1.0, if properly set via a @version attribute) from (X)HTML, SVG, or XML in general. The module can be used to produce serialized versions of the extracted graph, or simply an RDFLib Graph.
http://www.w3.org/2012/pyRdfa/
Other
67 stars 22 forks source link

scaffolding to work as a rdflib plugin #26

Closed gromgull closed 6 years ago

gromgull commented 6 years ago

RDFLib 5.0 has removed rdfa and microdata from the core repository, as they were drifting out of sync with the code here.

This PR adds the code required to function as an RDFLib parser back into this project.

The entry_points in setup.py ensures that if both rdflib and pyrdfa3 are installed users can do graph.parse( ..., format='rdfa') as before.

Note: Currently basic tests don't work for me - I always get:

Exception: RDFa parsing Error! __init__() got an unexpected keyword argument 'encoding'

however, I also get this from https://www.w3.org/2012/pyRdfa/Validator.html so I assume it's unrelated to these changes.

If I am mistaken let me know I'll update!

gromgull commented 6 years ago

I believe the test for the encoding issue:

modified   pyRdfa/__init__.py
@@ -614,7 +614,7 @@ class pyRdfa :
                    if self.charset :
                        # This means the HTTP header has provided a charset, or the
                        # file is a local file when we suppose it to be a utf-8
-                       dom = parser.parse(input, encoding=self.charset)
+                       dom = parser.parse(input, override_encoding=self.charset)
                    else :
                        # No charset set. The HTMLLib parser tries to sniff into the
                        # the file to find a meta header for the charset; if that
gromgull commented 6 years ago

via @olberger here: https://github.com/RDFLib/rdflib/issues/639

iherman commented 6 years ago

FWIW, I am fine with these changes (including https://github.com/RDFLib/pyrdfa3/pull/26#issuecomment-433654176). It cleans up a mess that we had (mostly I had...) many years ago.

Thanks.

gromgull commented 6 years ago

Then hit the merge button :D

This PR does not include the encoding fix - just commit that separately.

You should also make a pypi release to make life as easy as possible for pyrdfa3 - it can be done straight away - no need to wait for RDFLib 5.0.0 to actually go out.