BEL RDF schema - Githubissues

neoflex commented 8 years ago

Hi,

I generated the BEL RDF Schema running bel rdfschema I am a bit surprised by some statement in the produced ontology, let me explain:

1) We have:

hasObject subpropertyOf hasChild
hasChild domain Term

Now we add a BEL statement s1 such as:

s1 a Statement
s1 hasObject anyTerm

From the model we can infer:

s1 hasChild anyTerm
s1 a Term

Is it the expected behaviour that any Statement instance is also a Term by inference as soon as this statement has a hasObject or hasSubject property?

2) When I convert BEL statements to RDF I get, for instance, the following statement: a_CHEBI_3'_5'-cyclic_AMP_Increases_gtp_p_HGNC_RAP1A hasRelationship Increases

While in the model we have:

hasRelationship range Relationship Increases subClassOf CausalRelationship

We can thus infer:

Increases a Class Increases a Relationship

Is it intended that Increases is defined as a Class but also used as an individual (instance of Relationship?).

abargnesi commented 8 years ago

Great questions Valentin, thank you.

Is it the expected behaviour that any Statement instance is also a Term by inference as soon as this statement has a hasObject or hasSubject property?

No, the inference consequences of _hasSubject rdfs:subPropertyOf hasChild_ and _hasObject rdfs:subPropertyOf hasChild_ were not intended. It is an error in the schema IMO.

I believe the intention was to link a Statement resource to all child Term resources without needing to traverse both hasSubject and hasObject. For example a query to find all Statement referencing a ComplexAbundance:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
prefix bel: <http://www.openbel.org/bel/> .
prefix belv: <http://www.openbel.org/vocabulary/> .

select ?statement where {
  ?statement <rdf:type> <belv:Statement> .
  ?statement <belv:hasChild> ?child .
  ?child <rdf:type> <belv:ComplexAbundance> .
}

Additionally, a Statement can have only a subject Term, meaning the term was observed to occur in some biological context. See the example of a protein complex. Given the example's statement:

complex(p(HGNC:CCND1), p(HGNC:CDK4))

The RDF resource's class would be simultaneously ComplexAbundance, Term, and Statement:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix bel: <http://www.openbel.org/bel/> .
@prefix belv: <http://www.openbel.org/vocabulary/> .

<bel:complex_p_HGNC_CCND1_p_HGNC_CDK4> <rdf:type> <belv:ComplexAbundance>, <belv:Term>, <belv:Statement>;
  <rdfs:label> "complex(p(HGNC:CCND1),p(HGNC:CDK4))";
  <belv:hasSubject> <bel:complex_p_HGNC_CCND1_p_HGNC_CDK4>;
  <belv:hasChild> <bel:p_HGNC_CCND1> .

2) When I convert BEL statements to RDF I get, for instance, the following statement: a_CHEBI_3'_5'-cyclic_AMP_Increases_gtp_p_HGNC_RAP1A hasRelationship Increases

Great point. This is again not the expectation. We should be separating instances of relationships (e.g. increases, directlyIncreases, etc.) from relationship classes (e.g. Relationship, CausalRelationship, etc.). For example, for increases the schema could be:

@prefix belv: <http://www.openbel.org/vocabulary/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<belv:Relationship> <rdf:type> <rdfs:Class> .
<belv:CausalRelationship> <rdfs:subClassOf> <belv:Relationship> .
<belv:increases> <rdf:type> <belv:CausalRelationship> .

Does this schema make sense?

neoflex commented 8 years ago

Thanks for this detailed answer. For the first point, I understand the need to be able to simply query for hasChild to cover for both hasSubject and hasObject. I guess we could solve the inference issue by changing the domain of the hasChild property. A solution could be not to set any domain for this property or to set it to a new class where this new class would have as subclasses both Term and Statement. With any of those two solutions, we would prevent the inference s1 hasSubject aTerm => s1 a Term as we currently have. What do you think?

For the second point, the solution seems more complicated to me. If we represent "Increases" as an instance of CausalRelationship we would also have to do the same for "DirectlyIncreases" for instance. But then how do we still represent the fact that directlyIncreases implies increases? One solution could be to use SKOS to represent each relationship as a SKOS concept and then the hierarchy of concepts with skos:broader and skos:narrower relationships. I am not a big fan of this approach though as, as far as I know, most triple stores and inference engines won't be able to use this hierarchy.

Another idea would be to change more drastically the representation of statements. If we use RDFS reification syntax:

The BEL statement a(CHEBI:...) increases a(CHEBI:....) would be represented by the following triples:

t1 increases t2
s1 a rdf:Statement
s1 rdf:subject t1
s1 rdf:object t2
s1 rdf:predicate increases

In this case BEL Relationships would be represented as RDF properties with subProperties relationships to represent the hierarchy. With this representation t1 increases t2 =>t1 positiveRelationship t2 We could also add rdf:subject subPropertyOf bel:hasChild and rdf:object subPropertyOf bel:hasChild We would still need to think about how to represent statements made only with a subject though.

Last proposition would be to use named graphs instead of reification. That would reduce largely the number of triples needed to represent BEL statements but it would force people to use some compatible storage solutions and libraries.

OpenBEL / bel.rb

BEL RDF schema #113