RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
251 stars 64 forks source link

owl:imports Content Negotiation #98

Closed tobiasschweizer closed 3 years ago

tobiasschweizer commented 3 years ago

Hi there

I've been using pyshacl since a few days and I like it!

I noticed the --imports flag and wanted to try it right away after having read about it in #18.

So I tried the following:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

<http://datashapes.org/sh/tests/core/complex/personexample.test>
  rdf:type owl:Ontology ;
  rdfs:label "Test of personexample" ;
  owl:imports <http://datashapes.org/schema> ;
.

schema:PersonShape
    a sh:NodeShape ;
    sh:targetClass schema:Person ;
    sh:property [
        sh:path schema:gender ;
        sh:in ( "female" "male" ) ;
    ] .

Which gives me the error:

Bad syntax (expected '.' or '}' or ']' at end of statement) at ^ in: "b'<!DOCTYPE html>\n\n \n '^b'<META http-equiv="Content-Type" content="text/html; charset='..." File "/usr/local/lib/python3.9/site-packages/rdflib/graph.py", line 1256, in parse raise ParserError( rdflib.exceptions.ParserError: Could not guess RDF format for <rdflib.parser.InputSource object at 0x10f9f1730> from file extension so tried Turtle but failed.You can explicitly specify format using the format argument.

I figured that the source of the problem is that it got HTML instead of Turtle.

So I tried:

...
owl:imports <http://datashapes.org/schema.ttl> ; --> note the ttl extension
...

And got:

rdflib.plugins.parsers.notation3.BadSyntax: at line 4 of <>: Bad syntax (expected '.' or '}' or ']' at end of statement) at ^ in: "...b'ttp://topbraid.org/tosh">\n \n '^b'\n \n <sh:prefix'..." raise ParserError( rdflib.exceptions.ParserError: Could not guess RDF format for <rdflib.parser.InputSource object at 0x110035cd0> from file extension so tried Turtle but failed.You can explicitly specify format using the format argument.

This time it was XML instead of Turtle (https://www.topbraid.org/tosh). http://datashapes.org/schema.ttl imports http://datashapes.org/dash which imports https://www.topbraid.org/tosh. Manually, I can easily figure out that it should be https://www.topbraid.org/tosh.ttl but I have no control over this using pyshacl.

So my question is: Is there some mechanism for content negotiation (HTTP accept header) or some convention regarding the file extension pyshacl could use when processing owl:imports to avoid the problem described above?

ashleysommer commented 3 years ago

Hi @tobiasschweizer Thanks for reporting this issue. I thought I'd tested this in the past and it used to work fine. There must've been a regression at some point. I'll look into it.

ashleysommer commented 3 years ago

I've checked the code, it does use an Accept header in the request, like this:

headers = {'Accept': 'text/turtle, application/rdf+xml, application/ld+json, application/n-triples, text/plain'}

Then it checks the Content-Type header when received, to determine how to parse the received file (because it could be in any of the requested formats). Note, HTML is not in the request list, so HTML should never be returned.

Looks like this is a bug with the server that responds to datashapes.org and topbraid.org. I'll file a bug against their issue tracker.

https://github.com/w3c/data-shapes/issues/141

HolgerKnublauch commented 3 years ago

That's weird. We have moved the datashapes.org server recently and there were hiccups due to a missing .htaccess file. Yet it should work now, and does work for http://datashapes.org/dash but not http://datashapes.org/schema although I see no difference in how these are represented on our server.

My .htaccess file is

AddType  text/turtle             .ttl
AddType  application/rdf+xml     .rdf
AddType  application/ld+json     .jsonld
AddType  application/n-triples   .nt
AddType  application/owl+xml     .owl
AddType  application/trig        .trig
AddType  application/n-quads     .nq
AddType  application/rdf+thrift  .trdf
Options +MultiViews

and the top-level directory contains the files

yet the content negotiation only works for dash, not schema. Does anyone know what could be causing this?

ashleysommer commented 3 years ago

Woops, this automatically closed itself. - ReOpened. @tobiasschweizer I've pushed two changes in the new version of PySHACL, that work around this issue. 1) I've added datashapes.org/schema to the list built-in rdf graphs, there is a copy within PySHACL now, just like with shacl.ttl, so importing it doesn't issue a HTTP request. This saves latency and bandwidth when loading common ontologies. 2) I've added a bit better intelligence around detecting the content type of a file when loaded, if no other information is given, particularly in the case of detecting XML content. This will work around the tosh XML issue.

I've also added a test for this (test_098.py) that verifies that this is working now. Can you please test PySHACL v0.17.1 and report back?

tobiasschweizer commented 3 years ago

@ashleysommer Thanks a lot for looking into this!

I tried again with my current version pyshacl 0.17.0.post2 and it worked again when I used owl:imports <http://datashapes.org/schema.ttl>. I figure this is because of the changes described above.

Then I upgraded to pyshacl 0.17.1 which also works fine now with owl:imports <http://datashapes.org/schema>. 👍

ashleysommer commented 3 years ago

Thanks for the confirmation @tobiasschweizer I'll leave this issue open for now, until the problem with datashapes.org content-negotiation is sorted out.

tobiasschweizer commented 3 years ago

@ashleysommer That's fine. Please feel free to ping me whenever I could help trying out something.

I might have some general questions regarding best practices about re-using published SHACL shapes or publishing my own shapes. Is there a mailing list or a chat channel you use for discussion?

ashleysommer commented 3 years ago

Thanks @tobiasschweizer I'm not the best person to answer general questions regarding SHACL best practices and/or publishing Shapefiles, my involvement and knowledge of the SHACL community is limited to the code in PySHACL.

There is an active mailing list: https://lists.w3.org/Archives/Public/public-shacl/ and the SHACL community group they should be able to steer you in the right direction.

HolgerKnublauch commented 3 years ago

I believe we have fixed the underlying schema redirect, but I understand that Ashley has decoupled his tests now.

(This is one of the breakage points of semantic web technology in general... nice idea but too easy to break so in reality any app needs to replicate some copy)

ashleysommer commented 3 years ago

Thanks @HolgerKnublauch