Open ukemi opened 5 years ago
note order of part_ofs is based on the ontologies uses. GO part of cell part of anatomy occurs_in(EMAPA:35651),occurs_in(CL:0002064),occurs_in(GO:1990794)
On the 2019-08-29 call we talked about treating commas like pipes. For this example:
occurs_in(EMAPA:17597),occurs_in(CL:0000589),occurs_in(CL:0000601)
We would create two nested assertions:
primary_term-occurs_in->CL:0000589->part_of->EMAPA:17597
primary_term-occurs_in->CL:0000601->part_of->EMAPA:17597
So the rule would be to split on same ontology (like CL here). @ukemi does this sound right? Do I have the right set of relations in the translation?
@dustine32, this is correct. It should clean up a lot of cases where we used the incorrect delimiter in the annotation extensions.
Using the first annotation example above, I created this beautiful model:
The actual original GPAD line:
MGI MGI:1336882 acts_upstream_of_or_within GO:0070625 MGI:MGI:3801544|PMID:18535671 ECO:0000315 MGI:MGI:3056083 20160406MGI occurs_in(EMAPA:35651),occurs_in(CL:0002064),occurs_in(GO:1990794)|occurs_in(EMAPA:35651),occurs_in(CL:0002064),occurs_in(GO:0045178)
@ukemi Does this look right? I still need to tackle the "same-ontology-comma-split" issue my above comment.
And on the "same-ontology-comma-split" issue, I have the code now doing this:
From this annotation:
MGI MGI:1915585 acts_upstream_of_or_within GO:0090102 MGI:MGI:5615303|PMID:25605782 ECO:0000315 MGI:MGI:3817268 20151229 MGI occurs_in(EMAPA:17597),occurs_in(CL:0000589),occurs_in(CL:0000601)
@ukemi Does this model look right as well?
It'll be interesting if we have multiple instances of >1 same ontology in the same extension. E.g. occurs_in(EMAPA:17597),occurs_in(EMAPA:35247),occurs_in(CL:0000589),occurs_in(CL:0000601)
, though I have yet to find examples of this.
I found some example annotations of what looks like intended location nesting but using part_of
instead of occurs_in
:
MGI MGI:1336882 part_of GO:0031201 MGI:MGI:3801544|PMID:18535671 ECO:0000314 20160406 MGI part_of(EMAPA:35651),part_of(CL:0002064),part_of(GO:0042589)
The resulting translated assertion is currently looking like a starfish (or a 3-tentacled octopus):
@ukemi @vanaukenk Should these annotations be fixed in the upstream GPAD or should I translate these part_of
s just like the occurs_in
s?
For the component example above, we want to say that the GP -> part of CC1 -> part of CC2 -> part of CL ->part of EMAPA
Thanks @vanaukenk ! This makes a ton of sense to me now given that its primary term is a CC. After looking at this with @tmushayahama , should that first part_of
(GP -> part of CC1) be a located_in
?
@vanaukenk In fact, even for the simple, no-extensions GP-part_of->CC GPAD lines, should we be translating that part_of
qualifier to located_in
?
Good catch @dustine32 !
Looking at the example above, I think we actually don't have representation for this complete in the ShEx.
Currently, we have:
@vanaukenk Yeah that ShEx makes sense to me. I'll look at some of the protein complex annotations to see how I'm currently translating these and then update the code accordingly. Thanks!
@dustine32
I was just looking at the protein-containing complex part of the ShEx again which says that the relation between a protein-containing complex and a GO CC is 'located in'.
@vanaukenk Sorry, I'm slowly catching up to you as I just now realized SNARE complex is a descendant of GO:0032991. Your right, so I'll get to have more fun plugging this logic in.
No worries @dustine32 The ShEx is keeping us on our toes!
@vanaukenk @ukemi From the annotation that I mentioned in https://github.com/geneontology/go-shapes/issues/23:
MGI MGI:1336882 part_of GO:0042588 MGI:MGI:3801544|PMID:18535671 ECO:0000314 20160406 MGI part_of(EMAPA:35651),part_of(CL:0002064),part_of(GO:1990794)|part_of(EMAPA:35651),part_of(CL:0002064),part_of(GO:0045178)
This involves the CC-part_of->CC
relation, which I don't currently see in the ShEx spec:
<CellularComponent> @<GoCamEntity> AND EXTRA a {
a ( @<CellularComponentClass> OR @<NegatedCellularComponentClass> ) {1};
part_of: @<AnatomicalEntity> {0,1};
adjacent_to: @<AnatomicalEntity> *;
overlaps: @<AnatomicalEntity> *;
} // rdfs:comment "a cellular component"
And here's what I'm now doing for protein complex annotations:
MGI MGI:1336882 part_of GO:0031201 MGI:MGI:3801544|PMID:18535671 ECO:0000314 20160406 MGI part_of(EMAPA:35651),part_of(CL:0002064),part_of(GO:0042589)
@vanaukenk @ukemi I would be fine if you guys wanted to break the protein complex nesting into a new ticket.
Hmm. My only concern with this is that in the ontology we use part of for complexes and other components. 1745 protein-containing complexes are part_of some cellular_component.
Maybe off the central topic, but it seems odd to recapitulate ontology relations here in the model. e.g. "pancreatic acinar cell" is part of "pancreatic acinus" in the cell ontology.
It is odd. In my dream world, these would be shown either as a toggle or as persistent objects.
But that said, I did this all the time (when I actually had time to make models) just so I see it in the model.
In more complicated models, we will want to be able to attach functions to different cells that all might be part of the same anatomical structure.
It is odd. In my dream world, these would be shown either as a toggle or as persistent objects.
This looks like the start of an argument for better incorporation of ontology visualization into Noctua. I would imagine that many curators end up having one window with the ontologies they working with loaded into Protege or OLS etc. while they are working on their models. Having the ability to reveal the class structures they are using in the context of the instance models would be very powerful - especially in showing the impacts of inferences.
Hmm. My only concern with this is that in the ontology we use part of for complexes and other components. 1745 protein-containing complexes are part_of some cellular_component.
@ukemi Would a solution be to translate like SNARE complex
-has_part-> MGI:MGI:1336882
? I see that the ShEx allows both this and ProteinContainingComplex -has_part-> InformationBiomacromolecule.
Since I think @tmushayahama in Noctua form is recognizing "complex-to-GP"'s with the has_part relation, even though the two statements are equivalent(?), should we follow only one convention with complexes for consistency?
Just noticing this now, but wanted to double-check. When we incorporate the anatomical structures in the imports, they are still EMAPA terms if that is what they were in the original annotation, right?
@ukemi Yep, the EMAPA's stay the same in the import. I don't xref to UBERON or anything. I only attempt to follow xref's for the GOREL extensions relations to RO/BFO.
This sub-model for MGI:MGI:1336882 for your perusal.
When annotation extensions are comma delimited and are from GO_CC, CL and an anatomy ontology then the model should indicate GO_CC<>part_of<>CL<>part_of<>anatomy.
MGI | MGI:1336882 | acts_upstream_of_or_within | GO:0070625 | MGI:MGI:3801544|PMID:18535671 | ECO:0000315 | MGI:MGI:3056083 | 20160406 | MGI | occurs_in(EMAPA:35651),occurs_in(CL:0002064),occurs_in(GO:1990794)
See the bottom of model: http://noctua.geneontology.org/editor/graph/gomodel:5c4605cc00000315