bio2rdf / bio2rdf-rest-talend

A RESTful interface to the Bio2RDF network of data.
MIT License
5 stars 1 forks source link

Need to describe all Bio2RDF entities #7

Open micheldumontier opened 10 years ago

micheldumontier commented 10 years ago

Currently there is no support for entity resolution for arbitrary entities that appear in Bio2RDF datasets. Consider the statistics page for affymetrix probesets http://download.bio2rdf.org/release/3/html/affymetrix.html

The first type http://bio2rdf.org/wormbase_vocabulary:Resource does not resolve even though WormBase is part of the Bio2RDF distribution.

The second type http://bio2rdf.org/uniprot_vocabulary:Resource does not resolve.

the web application should send out a federated query to SPARQL endpoints to gather triples to describe these terms.

@VEmonet

vemonet commented 10 years ago

bio2rdf.org now supporting every dataset that is part of the Bio2RDF release 2 or 3.

Actually working on bio2rdf.org resolving any arbitrary entity.

fbelleau commented 10 years ago

@micheldumontier @VEmonet

I have deployed a first version of Talend implementation of the queryAll service based on the statistics of the release 3.

http://queryall.rest.bio2rdf.org/

The project in beta mode is here

https://github.com/fbelleau/bio2rdf-queryall

Michel infirm Vincent of the needed modification, then he will be possible to integrate the resolution of the 400 namespace URIs into Bio2RRDF main REST service.

vemonet commented 10 years ago

For Uniprot you were trying to resolve http://bio2rdf.org/uniprot_vocabulary:Resource But seems like you are using UniProt URI : http://purl.uniprot.org/core/Resource http://uniprot.bio2rdf.org/describe/?url=http%3A%2F%2Fpurl.uniprot.org%2Fcore%2FResource&sid=99

I have begun to implement the "QueryAll" like service in beta.bio2rdf.org For example try to resolve some URI : curl -i -H 'Accept: application/rdf+xml' http://beta.bio2rdf.org/genbank:BC149752 curl -i -H 'Accept: application/rdf+xml' http://beta.bio2rdf.org/ec:2.7.1.1

There might be some problems with some namespaces, feel free to point it out !

And note that when asking html through content-negociation it returns only the Virtuoso fct describe page for the entity (if it exists)

micheldumontier commented 10 years ago

Vincent, the redirection to the uniprot endpoint is on the bio2rdf side (not mine). Still, asking for curl -i -H 'Accept: application/rdf+xml' http://bio2rdf.org/wormbase_vocabulary:Resource does not federate over the lsr endpoint, where there is an additional "owl:Class" type assertion.

micheldumontier commented 9 years ago

What's the status on this item?

vemonet commented 9 years ago

Update of the REST service : to resolve a bio2rdf URI the service send a federated query to every bio2rdf triplestore to describe this URI Deployed on bio2rdf.org

But now the service don't redirect to the virtuoso page when html is asked

vemonet commented 9 years ago

http://answers.semanticweb.com/questions/11990/what-html-templating-systems-exist-for-rdf

ideas for the html rendering to be done

vemonet commented 9 years ago

http://bio2rdf.org/hgnc:4945

Found a bug due to :

com.hp.hpl.jena.shared.BadURIException: Only well-formed absolute URIrefs can be included in RDF/XML output: <http://> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A component that is required by the scheme is missing.

This error happens only when Jena try to write the RDF in the RDF/XML format. No problem when it's the n-triple or turtle format. I think that Jena is more strict with the URI formation when it writes in RDF/XML

vemonet commented 9 years ago

The Jena error is triggered by this triple :

<http://bio2rdf.org/hgnc:4945> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://hgnc.bio2rdf.org> .
vemonet commented 9 years ago

Found out where the problem comes from : According to Jena http://hgnc.bio2rdf.org is not a valid URI http://hgnc.bio2rdf.org/ is a valid URI.

vemonet commented 9 years ago

It comes from uniprot endpoint Note that I was using the latest version of Jena (http://mvnrepository.com/artifact/org.apache.jena/apache-jena-libs/2.11.2) And here is the code I was using to convert the triple :

String rdfInput = "<http://bio2rdf.org/hgnc:4945> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://hgnc.bio2rdf.org/> .";
Reader reader = new StringReader(rdfInput);
Writer writer = new StringWriter();
Model model = ModelFactory.createDefaultModel();
model.read(reader,  "default", "N-TRIPLE"); 
model.write(writer, "RDF/XML"); 
System.out.println(writer.toString());