geneontology / go-shapes

Schema for Gene Ontology Causal Activity Models defined using RDF Shapes
2 stars 0 forks source link

Update go-cam-shapes.shex #158

Closed vanaukenk closed 5 years ago

vanaukenk commented 5 years ago

Adding has_part relation to protein-containing complex to capture complex members.

goodb commented 5 years ago

@vanaukenk the test is failing because one of the has_parts on the complex is not an information biomacromolecule. (http://purl.obolibrary.org/obo/CHEBI_15414 S-adenosyl-L-methionine ).

Do you want to restrict complex parts to exclude compounds like that? If not, we can change the range to ChemicalEntity and it should work.

FYI, if you do, then many complexes in Reactome would not fit this structure. Nothing would break on that project because I've already moved definitions of complexes based on their parts out of the models and into an ontology, but I wanted to make sure you are aware. For what its worth, I see a lot of value in maintaining the model/ontology separation at that level.

vanaukenk commented 5 years ago

Thanks @goodb I think this would be a good issue to discuss on the GO-CAM specs call tomorrow. For manual curation, we have largely been thinking of complexes in terms of gene products, but maybe we do want to be more liberal here. I'm not sure. Also, I was trying to actually view this model in Noctua but couldn't find it via searching with the title or model id. Is it viewable somewhere?

goodb commented 5 years ago

@vanaukenk To my knowledge these models are not published in either of the noctua model collections that you could view online. Even if they were created in noctua, they might still not be in sync with what is here in the shapes project as the test model files are saved here separately. I resort to looking at them as text or in Protege (but remove the go_lego import statement if you go for Protege). If you really want to see them I can launch a local server with them and screenshare with you...

vanaukenk commented 5 years ago

Thanks @goodb

After discussion with @thomaspd and @ukemi this morning, we propose that protein-containing complexes will contain information biomacromolecules but NOT also contain chemicals (like http://purl.obolibrary.org/obo/CHEBI_15414 S-adenosyl-L-methionine).

This means that this model should now be in the list of models that should fail, rather than pass.

ukemi commented 5 years ago

So this means that when we import complexes from other resources, from a GO-perspective we will only care about the gene products in the complex. Another reason I think for the separation of entities from the models. Some resources like Reactome will have complexes that contain things other than gene products, but in the GO-CAM world we only care about what the gene products in those complexes are doing. We don't want non-gene products contributing to molecular functions.

goodb commented 5 years ago

@vanaukenk I'm going to close this PR. Its taken care of by https://github.com/geneontology/go-shapes/pull/159 .

Where are you documenting the reasoning underlying decisions like this? Maybe in the google doc I guess? I wonder if this shouldn't be placed directly in the schema as comments associated with the constraints as they are added. e.g. add like this:

has_part: @ * // rdfs:comment "In the GO-CAM modeling paradigm, we intentionally only record protein parts of complexes. We do not allow other entities, e.g. small molecules, because...";

Putting them there in context keeps everything in one place and will help us compute human readable explanations for validation failures. Note that we could use other properties aside from rdfs:comment if we want to be more specific about the notes.

vanaukenk commented 5 years ago

@goodb - I think originally we were proposing that the Google doc would house the human readable documentation, but I like the idea of also having comments associated with constraints in the ShEx. Let's confirm on the call today.