Closed gromgull closed 6 years ago
I believe the test for the encoding issue:
modified pyRdfa/__init__.py
@@ -614,7 +614,7 @@ class pyRdfa :
if self.charset :
# This means the HTTP header has provided a charset, or the
# file is a local file when we suppose it to be a utf-8
- dom = parser.parse(input, encoding=self.charset)
+ dom = parser.parse(input, override_encoding=self.charset)
else :
# No charset set. The HTMLLib parser tries to sniff into the
# the file to find a meta header for the charset; if that
via @olberger here: https://github.com/RDFLib/rdflib/issues/639
FWIW, I am fine with these changes (including https://github.com/RDFLib/pyrdfa3/pull/26#issuecomment-433654176). It cleans up a mess that we had (mostly I had...) many years ago.
Thanks.
Then hit the merge button :D
This PR does not include the encoding fix - just commit that separately.
You should also make a pypi release to make life as easy as possible for pyrdfa3 - it can be done straight away - no need to wait for RDFLib 5.0.0 to actually go out.
RDFLib 5.0 has removed rdfa and microdata from the core repository, as they were drifting out of sync with the code here.
This PR adds the code required to function as an RDFLib parser back into this project.
The entry_points in setup.py ensures that if both rdflib and pyrdfa3 are installed users can do
graph.parse( ..., format='rdfa')
as before.Note: Currently basic tests don't work for me - I always get:
Exception: RDFa parsing Error! __init__() got an unexpected keyword argument 'encoding'
however, I also get this from https://www.w3.org/2012/pyRdfa/Validator.html so I assume it's unrelated to these changes.
If I am mistaken let me know I'll update!