If source.content_type is "text/html" or "application/xhtml+xml" then parse the document as HTML and extract script elements of type="application/ld+json" as JSON-LD.
The default behavior is to extract only the first matching script element. These overrides are available:
To extract all script elements: supply an optional extract_all_scripts=True parameter to JsonLDParser.parse()
To extract one script element with a specific id attribute value: add the id value as a fragment identifier in the IRI available from source.getSystemId()
Detailed changes
rdflib.plugins.parsers.jsonld.JsonLDParser.parse
add docstring
change parameter list from **kwargs to explicit list
add optional extract_all_scripts parameter
get the fragment identifier from source.getSystemId()
add fragment_id and extract_all_scripts parameters to the call to source_to_json
rdflib.plugins.shared.jsonld.util.source_to_json
add docstring
add optional fragment_id and extract_all_scripts parameters
change the return value to a tuple with the extracted JSON document and value of the HTML base element
if source.content_type is "text/html" or "application/xhtml+xml" then parse source as HTML and extract the appropriate script element(s) and the HTML base element
if inputpath ends with ".html" (with optional fragment identifier) then invoke runner.do_test_html
test/jsonld/runner.py
add new do_test_html function (Note: the html test cases from the JSON-LD Test Suite combine testing
for JSON-LD extraction from the HTML with testing for other algorithms (e.g. compact/flatten),
which rdflib does not currently support. In order to test extraction only and ignore
the compact/flatten algorithms, do_test_html performs a graph comparison using
rdflib.compare.isomorphic, without serializing back to JSON)
Breaking Changes
When rdflib.plugins.shared.jsonld.util.source_to_json extracts JSON-LD from HTML, it needs to return the value of the HTML base element in addition to the JSON. I took the simplest path and returned a tuple containing the JSON and the base.
I can think of other ways to return the base without breaking the current return value:
Return json when processing a json document and tuple (json, base) when processing an html document.
Add an optional parameter to return tuple (json, base) instead of json.
Continue returning only json, but add an optional parameter which will receive the value of base.
Checklist
[x] Checked that there aren't other open pull requests for
the same change.
[x] Checked that all tests and type checking passes.
If the change adds new features or changes the RDFLib public API:
[x] Created an issue to discuss the change and get in-principle agreement. #2692
[ ] Considered adding an example in ./examples.
If the change has a potential impact on users of this project:
[x] Added or updated tests that fail without the change.
[ ] Updated relevant documentation to avoid inaccuracies.
coverage: 91.036% (+0.006%) from 91.03%
when pulling 53b353fbbf5147b9d2b6654532fbcc553b6881c7 on wallberg:issue-2692-embedded-jsonld
into 0ecc40009ae397c2798c0c08a2d751a1a9d2f8a7 on RDFLib:main.
Draft implementation of issue #2692.
See also https://w3c.github.io/json-ld-syntax/#embedding-json-ld-in-html-documents and https://www.w3.org/TR/json-ld11-api/#html-content-algorithms .
Summary of changes
If
source.content_type
is "text/html" or "application/xhtml+xml" then parse the document as HTML and extract script elements of type="application/ld+json" as JSON-LD.The default behavior is to extract only the first matching script element. These overrides are available:
extract_all_scripts=True
parameter toJsonLDParser.parse()
source.getSystemId()
Detailed changes
rdflib.plugins.parsers.jsonld.JsonLDParser.parse
rdflib.plugins.shared.jsonld.util.source_to_json
test/jsonld/test_onedotone.py
test/jsonld/runner.py
do_test_html
function (Note: the html test cases from the JSON-LD Test Suite combine testing for JSON-LD extraction from the HTML with testing for other algorithms (e.g. compact/flatten), which rdflib does not currently support. In order to test extraction only and ignore the compact/flatten algorithms, do_test_html performs a graph comparison using rdflib.compare.isomorphic, without serializing back to JSON)Breaking Changes
When
rdflib.plugins.shared.jsonld.util.source_to_json
extracts JSON-LD from HTML, it needs to return the value of the HTML base element in addition to the JSON. I took the simplest path and returned a tuple containing the JSON and the base.I can think of other ways to return the base without breaking the current return value:
Checklist
./examples
.