geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Anonymous blank nodes cause problems in Query engines with R2RML mappings #137

Closed mileswhen closed 1 year ago

mileswhen commented 3 years ago

Hello,

I hit a small issue and would appreciate any help! It seems that after converting to GO-CAM, the anonymous blank nodes (denoted with square brackets) cause errors in Query engines with R2RML mappings. For example,

[ a       <http://www.w3.org/2002/07/owl#Axiom> ;
  <http://www.w3.org/2000/01/rdf-schema#comment>
          "Entity Regulation Rule 3. The relation 'STATs bind gp130 phosphotyrosines' 'null' 'Tyrosine phosphorylation of STAT1, STAT3 by IL6 receptor' was inferred because:\n reaction1 has an output that is the enabler of reaction 2." ;
  <http://geneontology.org/lego/evidence>
          <http://model.geneontology.org/ev_w_id_R-HSA-1112565_RO_0002629_R-HSA-1112602_Reactome_R-HSA-1059683> ;
  <http://purl.org/dc/elements/1.1/contributor>
          "https://reactome.org/content/detail/R-HSA-1059683" ;
  <http://purl.org/dc/elements/1.1/date>
          "2021-06-28" ;
  <http://purl.org/pav/providedBy>
          "https://reactome.org" ;
  <http://www.w3.org/2002/07/owl#annotatedProperty>
          <http://purl.obolibrary.org/obo/RO_0002629> ;
  <http://www.w3.org/2002/07/owl#annotatedSource>
          <http://model.geneontology.org/R-HSA-1112565> ;
  <http://www.w3.org/2002/07/owl#annotatedTarget>
          <http://model.geneontology.org/R-HSA-1112602>
] .

I'm using the Ontop query engine in the SANSA framework, but have replicated the issue in Sparqlify (also R2RML). However, I have no issues with Apache Jena ARQ (no R2RML). An example of an error I encountered is:

it.unibz.inf.ontop.exception.InvalidMappingSourceQueriesException: Error: Cannot find relation
"ontop_sansa_db"."PUBLIC"."rdd1020132241_httpp3ap2fp2fpurlcorgp2fdcp2felementsp2f1c1p2fcontributor_xmlschemap23string_lang" (available choices:
["ontop_sansa_db"."PUBLIC"."rdd1020132241_httpp3ap2fp2fgeneontologycorgp2flegop2fevidencesbn",

"ontop_sansa_db"."PUBLIC"."rdd1020132241_httpp3ap2fp2fgeneontologycorgp2flegop2fmodelstate_xmlschemap23string_lang", 

"ontop_sansa_db"."PUBLIC"."rdd1020132241_httpp3ap2fp2fpurlcobolibrarycorgp2fobop2fbfo_0000050",

"ontop_sansa_db"."PUBLIC"."rdd1020132241_httpp3ap2fp2fpurlcobolibrarycorgp2fobop2fbfo_0000066",

"ontop_sansa_db"."PUBLIC"."rdd1020132241_httpp3ap2fp2fpurlcobolibrarycorgp2fobop2fro_0002212",

"ontop_sansa_db"."PUBLIC"."rdd1020132241_httpp3ap2fp2fpurlcobolibrarycorgp2fobop2fro_0002213",

"ontop_sansa_db"."PUBLIC"."rdd1020132241_httpp3ap2fp2fpurlcobolibrarycorgp2fobop2fro_0002233", 

"ontop_sansa_db"."PUBLIC"."rdd1020132241_httpp3ap2fp2fpurlcorgp2fdcp2felementsp2f1c1p2fcontributor_xmlschemap23string_langsbn",
...
Problem location: source query of triplesMap
[id: mapping--981465955
target atoms: triple(s,p,o) with s/RDF(TmpToSTRING("s"),IRI), p/<http://purl.org/dc/elements/1.1/contributor>, o/RDF(TmpToSTRING("o"),rdfs:Literal)
source query: SELECT * FROM "rdd1020132241_httpp3ap2fp2fpurlcorgp2fdcp2felementsp2f1c1p2fcontributor_xmlschemap23string_lang"]
  at it.unibz.inf.ontop.spec.mapping.pp.impl.SQLPPMappingConverterImpl.getRAExpression(SQLPPMappingConverterImpl.java:144)
  at it.unibz.inf.ontop.spec.mapping.pp.impl.SQLPPMappingConverterImpl.convert(SQLPPMappingConverterImpl.java:60)
  at it.unibz.inf.ontop.spec.mapping.impl.SQLMappingExtractor.convert(SQLMappingExtractor.java:235)
  at it.unibz.inf.ontop.spec.mapping.impl.SQLMappingExtractor.convert(SQLMappingExtractor.java:206)
...

And my quick fix for this was to simply reassign each blank node as an unique IRI, after which I am able to perform queries. For example,

<http://www.w3.org/2002/07/owl/axioms#0> a       <http://www.w3.org/2002/07/owl#Axiom> ;
  <http://www.w3.org/2000/01/rdf-schema#comment>
          "Entity Regulation Rule 3. The relation 'STATs bind gp130 phosphotyrosines' 'null' 'Tyrosine phosphorylation of STAT1, STAT3 by IL6 receptor' was inferred because:\n reaction1 has an output that is the enabler of reaction 2." ;
  <http://geneontology.org/lego/evidence>
          <http://model.geneontology.org/ev_w_id_R-HSA-1112565_RO_0002629_R-HSA-1112602_Reactome_R-HSA-1059683> ;
  <http://purl.org/dc/elements/1.1/contributor>
          "https://reactome.org/content/detail/R-HSA-1059683" ;
  <http://purl.org/dc/elements/1.1/date>
          "2021-06-28" ;
  <http://purl.org/pav/providedBy>
          "https://reactome.org" ;
  <http://www.w3.org/2002/07/owl#annotatedProperty>
          <http://purl.obolibrary.org/obo/RO_0002629> ;
  <http://www.w3.org/2002/07/owl#annotatedSource>
          <http://model.geneontology.org/R-HSA-1112565> ;
  <http://www.w3.org/2002/07/owl#annotatedTarget>
          <http://model.geneontology.org/R-HSA-1112602> .
balhoff commented 3 years ago

@mileswhen the blank nodes are specified in the OWL mapping to RDF. Would it be possible for you to add a SPARQL CONSTRUCT step to transform these before using the files? I'm not very familiar with R2RML, but I'm surprised there wouldn't be a way to handle blank nodes like this.

We could also explore making available some non-OWL RDF transformations if we can find something that would be generally useful.

mileswhen commented 3 years ago

@balhoff thanks for the reply and suggestion! I implemented the quick fix in Python to transform the files directly, but it should also be possible to perform a CONSTRUCT query in Jena ARQ to get the same result.

I'm not sure how important it would be to have this issue fixed. Though for any framework that uses R2RML (relational data to RDF), I think that anonymous blank nodes might be an issue, as described here.