RDFLib / pymicrodata

This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by the W3C Semantic Web Interest Group task force, in March 2012. The module can be used to produce serialized versions of the extracted graph, or simply an RDFLib Graph Object.
http://www.w3.org/2012/pyMicrodata/
Other
44 stars 10 forks source link

Problems with getting RDF from XHTML5 served as application/xhtml+xml? #6

Open christianhujer opened 7 years ago

christianhujer commented 7 years ago

The tool seems to have problems extracting the schema.org RDFa data from the following page: http://nelkinda.com/blog/user-stories-are-not-always-user-stories/ The page is written in XHTML5, delivered as application/xhtml+xml, encoded with gzip, and it seems that pymicrodata is unable to extract any information from it. I have successfully used the following tools with said page:

By the way, the tools from Microsoft also have problems with this page.

christianhujer commented 7 years ago

The following attachment is a zip archive with the XHTML page that isn't processed successfully.

sample.zip

iherman commented 7 years ago

@christianhujer: there were two problems. One was yours and the other was mine...

Thanks for the bug report!

christianhujer commented 7 years ago

@iherman Ahaha, thanks for clearing it up! I actually used the service at W3C, but when reporting the bug I must have confused the two libraries (RDF vs microdata). And I can confirm that the bug is now fixed. I have, however, found another small glitch, which I will report at pyrdfa3.