ISAITB / shacl-validator

Web and command-line application for the validation of RDF data.
https://joinup.ec.europa.eu/collection/interoperability-test-bed-repository/solution/rdf-validator
European Union Public License 1.2
15 stars 1 forks source link

[feature request] Validator does not implement content negotiation #12

Closed Markus92 closed 3 weeks ago

Markus92 commented 1 month ago

When validating SHACL with input URI, it seems the content negotiation does not work. Based on some Wireshark sniffing, it seems that every HTTP(S) request gets the following header:

Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

Our metadata server thus thinks this is a browser and will return a regular HTML page, instead of machine readable metadata.

Desired behavior would be to have this header to be for example Accept: text/turtle when specifying for the Turtle syntax or Accept: application/ld+json for JSON/LD. Even better would of course for all supported syntax types to be specified when leaving the default on "Based on file extension", and the validator parsing the Content-Type header.

costas80 commented 1 month ago

You're right @Markus92. When specifying a remote URI the validator doesn't currently leverage the Accept header (in the request) nor the Content-Type header of the response (from the remote server). Your proposal would be a nice improvement whereby (recapping as you suggested):

Besides applying this for the content to validate we can extend the logic also to user-provided SHACL shapes (if a given validator instance supports them).

We'll work on this update asap. It might take a bit longer than usual due to the summer holidays but I'll ping you here as soon as the update is published.

costas80 commented 3 weeks ago

Hi @Markus92. The validator (latest docker image and managed service) is now updated to correctly perform content negotiation as summarised earlier. In brief, the Accept header is set with the selected content type, or if one is not selected, to all supported RDF content types. The content type of the retrieved content is then determined from the response's Content-Type header (if present). The fix applies both to the content to validate as well as user-provided shapes (if applicable).

Once you've had the chance to check on your end, would you please confirm so that we close this issue? Thanks!

Markus92 commented 3 weeks ago

Hi @costas80 , I checked the latest version on a few endpoints that I know do support content negotation or content types and it works flawlessly!

Endpoints I tested: orcid: https://orcid.org/0000-0002-0604-1204 (seems to negotiate turtle) FAIR Data point: https://fdp.healthdata.nl (supports turtle, json/ld and xml. Judging by the validated content output, it grabs the XML, which I guess is the first type in the list). Molgenis EMX2: https://emx2.dev.molgenis.org/api/fdp (doesn't negotiate, but makes clear its output is text/turtle).

Thanks a lot for picking this up, it's a nice QOL upgrade for our users.