geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

NTR: cellular anatomical entity #17696

Closed balhoff closed 5 years ago

balhoff commented 5 years ago

For the GO-CAM schema, we would like to group cellular components under CARO anatomical entity, so that we can state that a molecular function instance occurs in 0 or 1 anatomical entity (being a cellular component, cell, or gross anatomical structure). I opened a PR #17695 to this end, but it introduces a problem in that it makes protein-containing complex a subclass of anatomical entity. CARO anatomical entity states it has "granularity above the level of a protein complex". Complexes are a bit of a problem as anatomical entities anyway, because (I think) we don't want to have GO-CAMs stating that functions occur in complexes.

There is currently no term that groups other cellular components to the exclusion of protein-containing complex. Could we add a term like 'cellular anatomical entity' as a child of 'cellular_component', with all cellular components as children with the exception of 'protein-containing complex'? Then in go-bridge.owl we would make 'cellular anatomical entity' a child of CARO 'anatomical entity'.

pgaudet commented 5 years ago

Interesting. I had understood "granularity above the level of a protein complex" to exclude protein complex (as in >, not >=).

Pascale

cmungall commented 5 years ago

This parallels exactly a conversation we are having in obo core, I will link to here when we have a ticket

UPDATE https://github.com/OBOFoundry/Experimental-OBO-Core/issues/41

ukemi commented 5 years ago

It seems this will be needed if we want to successfully coordinate with CARO and intend to use their definition of anatomical entity.

balhoff commented 5 years ago

@ukemi or @vanaukenk would you mind creating a pull request for this? I am not set up for minting term IDs.

vanaukenk commented 5 years ago

Will do.

ukemi commented 5 years ago

https://github.com/geneontology/go-ontology/pull/17704

ukemi commented 5 years ago

Looks like we created the term and merged, but we still haven't moved the "xyz part' terms to be direct children of cellular component. We need to do this and then assert everything under those that is not a protein-containing complex to be a cellular anatomical entity. Otherwise some protein-containing complexes are asserted or inferred to be cellular anatomical entities. For example look at 6-phosphofructokinase complex.

ukemi commented 5 years ago

Should we unmerge #17704 or just work from it as a starting point?

pgaudet commented 5 years ago

Ah ! Perhps we should undo the changes, then.

@balhoff Would you be doing the script to move the part terms under the new term ? Any idea how long that would take to do ?

Thanks, Pascale

ukemi commented 5 years ago

Or, would it be easier to leave this in place as the first step. This didn't break anything, but now there are a lot of protein complexes that are mis-classifed as cellular anatomical entities.

balhoff commented 5 years ago

Would you be doing the script to move the part terms under the new term ? Any idea how long that would take to do ?

I can do this. I don't think it would take very long. I'm wondering about the class hierarchy of these. Why is it asserted? I see that cell projection part is asserted subclass of cell part. But then there is no part_of some cell on cell projection which would have allowed that to be inferred instead. Is this a holdover from an earlier modeling style?

cmungall commented 5 years ago

Assume it's a holdover.

I could potentially do this using hacky tools like obo-sed.pl but I think you could do it in a more principled way Jim.

ukemi commented 5 years ago

Good to see these comments. I don't think un-merging is a good idea since we coined a new identifier. I think this is a holdover from the old style where we created a new is_a path by simply make a xyz part term for a part_of parent of the term that was not subsumed. Let's discuss on Monday's call.

ukemi commented 5 years ago

Plan is that @balhoff will write a script to clean this all up and obsolete the 'xyz part' terms at the same time rather than do it incrementally. @pgaudet can you open an annotation ticket for re-annotation of direct annotations to part terms and send the obsoletion notice? If the annotations are not changed, they will be replaced with the xyz term, which will be the object of the replaced_by tag. @balhoff, can you also fill in any details to track this work in this ticket?

ukemi commented 5 years ago

As I'm looking over some annotations, I'm wondering what we are going to do with terms (and their annotations) like 'neuron part'. In GO-CAM these could be represented as 'cellular anatomical structure' part_of some neuron? Up for discussion.

pgaudet commented 5 years ago

Can you not annotate to 'occurs_in' neuron as we proposed ?

ukemi commented 5 years ago

It's a component annotation. Components don't occur.

pgaudet commented 5 years ago

located_in?

pgaudet commented 5 years ago

In GO-CAM these could be represented as 'cellular anatomical structure' part_of some neuron?

I thought we could annotate location to a cell without a CC annotation.

ukemi commented 5 years ago

But it would point directly to a cell type. This wouldn't be a cc annotation.

pgaudet commented 5 years ago

As I understand the ShEx this is allowed.

ukemi commented 5 years ago

But before that, what do we do with these terms in the ontology?

pgaudet commented 5 years ago

obsolete ?

balhoff commented 5 years ago

@ukemi are you saying that for 'neuron part', I should not output a replaced_by pointing to 'neuron'? I.e. only do this if the value will be a GO term.

ukemi commented 5 years ago

Yeah, don't you think it would be kind of weird to point to a term from an import?

ukemi commented 5 years ago

Actually, thinking about the strategy you are employing, it would make sense for you to point to neuron for the auto-classification of the children.

ukemi commented 5 years ago

They would all be subsumed by either cellular anatomical structure or protein-containing complex, right?

balhoff commented 5 years ago

Yeah, don't you think it would be kind of weird to point to a term from an import?

If we're only thinking about GO annotations, then yes. But for general uses of the ontology, I think it makes sense. However, along those lines, I'm not sure replaced_by is correct here. It works in this case for migration of GO CC annotations, but these terms aren't really semantic replacements. In lots of other situations you wouldn't want to replace cell part with cell.

Should I output as consider instead? And include the external terms. In any case, all part_of relations will be output, pointing to whatever term.

bmeldal commented 5 years ago

What did you do with GO:0099080 supramolecular complex? Is it a complex, is it a structure???

Otherwise, no objections :)

ukemi commented 5 years ago

@balhoff, is this one done?

balhoff commented 5 years ago

@ukemi I think so, although there are still 40 terms that are inferred subclasses of both 'cellular anatomical entity' and 'protein-containing complex'. But we have #17777 for that.

ukemi commented 5 years ago

Yup. Hopefully I will take care of those very soon once I clear some other tasks off my plate.