geneontology / noctua

Graph-based modeling environment for biology, including prototype editor and services
http://noctua.geneontology.org/
BSD 3-Clause "New" or "Revised" License
37 stars 12 forks source link

Add SO to Noctua #561

Open pfey03 opened 6 years ago

pfey03 commented 6 years ago

In my P2GO annotations I wanted to model I have some SO extensions, such as SO:0100014 n-terminal region (of the has_input H2Bv3 Ddis)

I noticed I cannot add these in noctua. Also for model gomodel:5ae3b0f600000395

Thanks, Petra

kltm commented 6 years ago

@pfey03 Can you confirm which field and interface (Graph or Form) that you were using?

kltm commented 6 years ago

@cmungall This should just as simple as adding to the NEO imports, right?

pfey03 commented 6 years ago

I tried it in the graph editor, but don't think the form would have SO? It didn't even take the protein entity (H2Bv3) I want to add the SO terms to in the form.

cmungall commented 6 years ago

This should just as simple as adding to the NEO imports, right?

Yes, easy to add SO, but we need to first be sure that we have the usage of the ontology documented. AFAIK we don't have any curator documentation for use of SO here, @vanaukenk

The use of SO biological_region hierarchy to describe sites on proteins etc seems reasonable but want to check we are all doing things the same way.

vanaukenk commented 6 years ago

@pfey03 - For this example, can you give us the full annotation you'd like to make, e.g. what the MF term is and the relations and extensions?
That will help us understand better what you're trying to state with this annotation. Thx!

pfey03 commented 6 years ago

@vanaukenk Oh wow, yes this issue! Here is the annotation line from GPAD: UniProtKB Q54XI2 enables GO:1990404 PMID:28252050 ECO:0000314 20180529 dictyBase occurs_at(SO:0100014),occurs_at(SO:0001454),has_input(UniProtKB:Q54LP8) goEvidence=IDA

Thanks!

pgaudet commented 6 years ago

GREEKC also wants to use SO as 'has input' for transcription factors.

cmungall commented 6 years ago

There are some potential issues here. We have been treating SO as being molecular entities. @mikebada and @msinclair2 have been making https://github.com/The-Sequence-Ontology/MSO, the idea being that when this is released SO will be "re-declared" as being information entities. This will render all such annotations as invalid since inputs/outputs must be physical. This is not an idle philosophical issue, the domain/range constraints in RO which will cause these to show up as invalid.

See also: https://github.com/The-Sequence-Ontology/MSO/issues/5

mikebada commented 6 years ago

There's been a lot of conflation for a while: The SO sequence entities in the last major SO journal article were described to be GDCs, though it was also said that the qualities don't make sense for the GDCs. Additionally, the current SO has some classes that really don't make sense as molecular entities (e.g., read, contig), but many/most of the natural-language definitions of the classes of the current SO are for the independent continuants. So, GDC/IC deconflation has been a primary motivation for this refactoring work.

In the refactored versions, the sequence entities of the MSO and SO are almost entirely parallel, with the exception that the relatively small number of classes that don't really make sense as molecular entities will only exist in the SO. Additionally, the SO classes are directly defined in terms of (generically dependent on) their corresponding MSO classes, e.g., SO:gene generically_dependent_on some MSO:gene. However, corresponding SO and MSO classes have the same ID, just different namespaces, e.g., for gene, SO:0000704 and MSO:0000704. So, for those resources that need to refer to molecular entities, hopefully much of the migration can be taken care of by simply replacing the SO namespace with MSO.

msinclair2 commented 6 years ago

@pgaudet I made a presentation directly to GREEKC in Hinxton this April describing the MSO and SO and that MSO should be used for molecular interactions. So I believe they should be already aware of the issue and the solution with MSO.

vanaukenk commented 6 years ago

@pfey - thanks for the example.

In this case, it looks like you are trying to capture the general region of the protein that is modified (e.g. ADP-ribosylated) by capturing that the modification _occursat an amino acid in the N-terminal region of the substrate (H2Bv3). Is that correct?

_occursat is not currently in RO and so is not available to use in Noctua. In models where similar annotations have been created, curators have used PRO ids to capture inputs and outputs of enzymatic activities where they wanted to specifically indicate what the unmodified and modified inputs and outputs were. See for example: http://noctua.geneontology.org/editor/graph/gomodel:581e072c00000473

@thomaspd @cmungall @pgaudet I don't think we want to use _occursat in this way in GO-CAM models. We should review other cases of _occursat to see how that information would be modeled in Noctua. Paul, do you already have any examples for this?

vanaukenk commented 2 years ago

Leaving this ticket open, as this is still a need for some curators.

pfey03 commented 2 years ago

Also, I could easily use has_input. These annotations were quite old when it was still possible in P2GO. I would not occurs_at anymore for these