Open alpha-beta-soup opened 4 years ago
After reviewing the source, it doesn't seem to be the case that pyld doesn't inspect Link
headers, but that it does response.json()
, triggering an exception right before the Link
header would be inspected, so it never gets that far. This can possibly be avoided by first checking whether the Content-Type
is some kind of JSON (since https://schema.org will response with HTML). The error suggests that for whatever reason, at the point of the exception, response
is None
.
pyld.jsonld.JsonLdError: ('Dereferencing a URL did not result in a valid JSON-LD object. Possible causes are an inaccessible URL perhaps due to a same-origin policy (ensure the server uses CORS if you are using client-side JavaScript), too many redirects, a non-JSON response, or more than one HTTP Link Header was provided for a remote context.',)
Why require JSON repsonses if the Link
of type alternate
is intented to point to the alternate representation? https://html.spec.whatwg.org/multipage/links.html#rel-alternate
If the alternate keyword is used with the type attribute, it indicates that the referenced document is a reformulation of the current document in the specified format.
The Link handling code is in the document loaders right below where that json() call happens. Quite possible that code hadn't been properly tested before. If someone has time to refactor that code to handle Link header in the proper order, that would be great.
@davidlehn I'm trying to learn how the tests are put together, to get a clear failing case before trying to fix the issue. If you can help with that, I'm willing to try and fix it.
I have the existing test suite running (although I get five failures). To that I've added a manifest.json
in the root, and two test cases at the root as well.
manifest.json
:
{
"@context": ["context.jsonld", {"@base": "manifest"}],
"@id": "",
"@type": "mf:Manifest",
"name": "JSON-LD Test Suite",
"description": "This manifest loads some tests for resolving https://github.com/digitalbazaar/pyld/issues/128",
"sequence": [
"sample.jsonld",
"sample2.jsonld"
]
}
sample.jsonld
{
"@context": "https://schema.org",
"@type":"Dataset",
"@id":"http://localhost:5000/collections/obs",
"url":"http://localhost:5000/collections/obs"
}
sample2.jsonld
{
"@context": "https://schema.org/docs/jsonldcontext.jsonld",
"@type":"Dataset",
"@id":"http://localhost:5000/collections/obs",
"url":"http://localhost:5000/collections/obs"
}
I run the tests in a virtual environment as: python tests/runtests.py ./manifest.jsonld
, but the test suite skips them:
/usr/lib/python3/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.24.1) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
PyLD Tests
Use -h or --help to view options.
JSON-LD Test Suite: http://localhost:5000/collections/obs: None ... skipped "Test type of ['Dataset']"
JSON-LD Test Suite: http://localhost:5000/collections/obs: None ... skipped "Test type of ['Dataset']"
----------------------------------------------------------------------
Ran 2 tests in 0.000s
OK (skipped=2)
How can I test these?
The Link handling code is in the document loaders right below where that json() call happens. Quite possible that code hadn't been properly tested before. If someone has time to refactor that code to handle Link header in the proper order, that would be great.
Wouldn't a possible fix be as follows:
After performing the initial request which returns a response with alternate link headers:
if response.headers['Link']:
links = response.links
if links['alternate'] and links['alternate']['type'] == 'application/ld+json':
response = requests.get(response.url+links['alternate']['url'], headers=headers, **kwargs)
Let's say I have this extremely minimal bit of JSON-LD to be expanded with pyld:
If I susbtitute "https://schema.org" with "https://schema.org/docs/jsonldcontext.jsonld", with the code otherwise unchanged, it will correctly print (as I expected):
However, that then seems to mess up other parsers, including the Google Structured Data Testing Tool.
The root issue seems to be with pyld's remote fetching of contexts, in that "https://schema.org/" does not now have an
application/ld+json
content-type, instead opting to useLink
header withrel=alternate
andtype=application/ld+json
. It seems that pyld needs to be updated to handle that case:If you do
curl https://schema.org/ -H "Accept: application/ld+json"
you will still get back an HTML response.Perhaps the cleanest way to implement this would be to check if a non-JSON-LD response is recieved, and if so, to look for an appropriate
Link
header and then make a request there.