dbcls / LinkedData-Agora

4 stars 0 forks source link

WikiPathways: how to implement "Links to Other Dataset" #156

Closed egonw closed 3 years ago

egonw commented 4 years ago

With a lot of important functionality restored after the cyberattack (and now in a more stable way), we can start improving our score. One aspect I'm looking at is the Linked Data aspect. The WikiPathways RDF never supported linked data for anything other than pathways, so most IRIs in the data set to not resolve. However, we do link out to other resources.

So, I am exploring how to meet the YummyData void:Linkset expectations. The report of yesterday still says "N/A":

image

And we do specify a void:Linkset:

<http://data.wikipathways.org/20200810/linkset/wikidata>
        a                    void:Linkset ;
        dcterms:title        "WPRDF to Wikidata Linkset" ;
        void:linkPredicate   wp:bdbWikidata ;
        void:objectsTarget   <https://www.wikidata.org/entity/Q2013> ;
        void:subjectsTarget  <http://data.wikipathways.org/20200810/rdf/> .

But the above YummyData page does not provide a log message, which makes debugging a bit hard. Can you please provide me more info on how it tests for linksets? Thanks!

yayamamo commented 3 years ago

Sorry for not responding for a while. I've checked the log and found our crawler said Inconsistent content type: response = , parser = text/turtle, as seen at the bottom of the log below.

https://yummydata.org/endpoint/18/log/void?date=2020-08-13

So, I suspected that there's something missing in the response header, and I've issued the query as follows:

curl -svL -H "Accept: text/turtle, text/n3, application/n-triples, application/n-quads, application/rdf+xml, application/rdf+json, application/ld+json, application/trig, application/trix" "http://sparql.wikipathways.org/.well-known/void"

This command can be copied by clicking the icon at the top of the log page saying "Copy CURL Query". Then, I found the Content-Type header was missing, and it might be the reason. Could you check this?

*   Trying 85.214.42.229...
* TCP_NODELAY set
* Connected to sparql.wikipathways.org (85.214.42.229) port 80 (#0)
> GET /.well-known/void HTTP/1.1
> Host: sparql.wikipathways.org
> User-Agent: curl/7.64.1
> Accept: text/turtle, text/n3, application/n-triples, application/n-quads, application/rdf+xml, application/rdf+json, application/ld+json, application/trig, application/trix
> 
< HTTP/1.1 200 OK
< Date: Tue, 20 Oct 2020 09:43:59 GMT
< Server: Apache/2.4.46 (Unix)
< Last-Modified: Wed, 09 Sep 2020 22:51:16 GMT
< ETag: "172b-5aee94a04e100"
< Accept-Ranges: bytes
< Content-Length: 5931
< 
{ [2692 bytes data]
yayamamo commented 3 years ago

Our crawler only checks void:target properties currently, and we acknowledged that it needed to extract void:objectsTarget ones, too. So, it will be fixed soon. Thank you for posting this issue.

yayamamo commented 3 years ago

@egonw Now, you can confirm that Links to Other Dataset is set.

https://yummydata.org/endpoint/18?date=2020-11-04

egonw commented 3 years ago

oh, super cool! thanks!!!

yayamamo commented 3 years ago

Thank you for making this issue. We'd like to keep improving our service, and please let me know if you find anything wrong.