geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

Refactor cellular processes in GO #12849

Open ukemi opened 7 years ago

ukemi commented 7 years ago

Currently in GO the definition of a cellular process is quite fuzzy:

Any process that is carried out at the cellular level, but not necessarily restricted to a single cell. For example, cell communication occurs among more than one cell, but occurs at the cellular level.

This definition makes it very difficult to formalize the meaning of this term and historically we have not necessarily created cellular versions of every process in GO and we have not always created generic processes for their cellular counterparts. This has become problematic over time. Most recently discussed in this ticket, #12721

There have been at least two proposals about how to deal with cellular processes which should be discussed.

  1. The distinction is not very relevant and we should merge the cellular terms into their generic parents. For those that do not have generic parents, change the terms to be more generic. This would mean that users will no longer be able to distinguish cellular from other levels of granularity. We have heard from at least a few users that they would like to be able to distinguish at this level.
  2. Make the distinction explicit. Define 'cellular process' as a process that occurs in a cell. This means that things like cell-cell communication, synaptic transmission etc would no longer be cellular processes. This split would be similar to the split between single-organism processes and multi-organism processes, however would not give us an advantage of adding a disjoint relation because some organisms are single cells.

Tagging for further comment. @cmungall @hdrabkin @ukemi @dosumis @paolaroncaglia @mcourtot @tberardini @thomaspd @balhoff

hdrabkin commented 7 years ago

"cell-cell communication, synaptic transmission" But I would imagine that some parts of these processes are in fact cellular?

dosumis commented 7 years ago

But I would imagine that some parts of these processes are in fact cellular?

Some parts occur in cells. But that's true of most BPs. I doubt that neurobiologists would care much or even notice if we don't classify synaptic transmission as 'cellular'.

dosumis commented 7 years ago

One possible issue for occurs_in : import into cells - starts on the outside. Some of our axiomatization makes this explicit, which may cause problems at some point:

has_target_end_location o part_of -> has_target_end_location occurs_in o part_of -> occurs_in

has_end_location <- occurs in

Another possibility, reasonable clear but harder to axiomatize:

A process occurring in or involving exactly one cell.

ukemi commented 7 years ago

Is there a way to express the cardinality of only one cell?

cmungall commented 7 years ago

Some parts occur in cells. But that's true of most BPs. I doubt that neurobiologists would care much or even notice if we don't classify synaptic transmission as 'cellular'.

I doubt most biologists of type X would care if we placed their terms under high-level grouping class Y

That's not the use case. I'm not entirely sure what the use case for having the term at all is. The only one I have gathered is the paint use case. And here it would be bad to not have synaptic transmission under CP.

Everyone should add use cases here: https://docs.google.com/document/d/1QpfUY_LgeIryMj6EEAE05FXLE_894GalkerBU8dMuVU/edit#

I also just think it would be bonkers not to have ST under CP. IANAB but seems massively unintuitive.

cmungall commented 7 years ago

Is there a way to express the cardinality of only one cell?

Yes, in full DL, would not classify in Elk, and I don't think we can use the ELshunt pattern

But I think this is going down the wrong path anyway. See the doc.

balhoff commented 7 years ago

@cmungall can you point me to the "ELshunt pattern"? I couldn't find anything relevant with Google.

cmungall commented 7 years ago

@cmungall can you point me to the "ELshunt pattern"? I couldn't find anything relevant with Google.

Example here: https://github.com/pato-ontology/pato/blob/master/src/ontology/pato_ext_notes.md

cmungall commented 7 years ago

See https://docs.google.com/document/d/1QpfUY_LgeIryMj6EEAE05FXLE_894GalkerBU8dMuVU/edit#

There have been at least two proposals about how to deal with cellular processes which should be discussed.

There is actually another possibility.

  1. Merge cellular X -> X
  2. Keep a high level CP class, define according to the doc (intuitively: something that can be represented in a LEGO model)
  3. Manually classify a handful of high level GO processes here: metabolism, ....
thomaspd commented 7 years ago

We discussed on ontology call, and we'd propose that cellular process (maybe clearer to rename it "cellular-level process" to distinguish from multicellular organism-level process) include both intracellular and intercellular processes. We don't need to add sub-classes for intracellular and intercellular processes, for the time being. But we can accept Chris's suggestion immediately above.

ukemi commented 7 years ago

So I will rename the term. Do we want to do the merges as Chris suggests above? If we do that, very few classes will be at a cellular level. Clearly metabolism won't be one of them, but things like cell adhesion and cell-cell signaling will.

dosumis commented 7 years ago

There is actually another possibility.

Merge cellular X -> X Keep a high level CP class, define according to the doc (intuitively: something that can be > represented in a LEGO model)

??? Are there granularity limits to what we can put in a LEGO model?

Manually classify a handful of high level GO processes here: metabolism,

I predict TPV problems re-emerging regularly. Maybe OK if we can rely on Val to be semi-automated TPV detector?

cmungall commented 7 years ago

On 8 Dec 2016, at 9:57, David Hill wrote:

So I will rename the term. Do we want to do the merges as Chris suggests above? If we do that, very few classes will be at a cellular level. Clearly metabolism won't be one of them, but things like cell adhesion and cell-cell signaling will.

This isn't the essence of my proposal. High level terms such as metabolism would be manually placed under CP. I would argue all metabolism is cellular.

This is inseparable from a larger refactor of the upper part of BP, which I think is in order. Not ready to file a ticket for this yet, but started fleshing this out here: https://gist.github.com/cmungall/4ed28123c3db832a7d99cbdd8e8a5920

cmungall commented 7 years ago

On 8 Dec 2016, at 12:31, David Osumi-Sutherland wrote:

There is actually another possibility.

Merge cellular X -> X Keep a high level CP class, define according to the doc (intuitively: something that can be > represented in a LEGO model)

??? Are there granularity limits to what we can put in a LEGO model?

You're quite right, we could use Noctua to connect together immune cells to make a model of the immune system. I meant 'what we currently do in LEGO models' but I admit this is not a good long term definition.

A better formulation would be a process that can be dissected into the actions of molecules and cell parts.

Manually classify a handful of high level GO processes here: metabolism,

I predict TPV problems re-emerging regularly. Maybe OK if we can rely on Val to be semi-automated TPV detector?

My intuition is that this can be avoided, and we can even have disjointness axioms to help us (of course, we would still have part-of cross-granular relationships).

ValWood commented 7 years ago

I'm with Chris here. There should be a way to define all metabolism as cellular (I was trying to say this unsuccessfully in the previous ticket). We know there are cases, like Peter's amino acid-liver example, where different cells perform part of a process at a tissue level. However, there is something fundamentally different about these processes which occur at the level of a single cell, and a process like behavior or development which clearly has parts which don't happen at a cellular level. I think we still haven't hit on the criteria to make this clear.

@ukemi do you have an example of a specific metabolism annotation which you think would seem odd annotated as a cell level process? Perhaps a hormone regulating a metabolic process (this still exerts it's effect in single cells so I wouldn't have a problem with this being cellular). Identifying some of the particular annotations of concern might help with the differentia for cell level.

ValWood commented 7 years ago

Based on the eventual def, do you think that "protein localization" is also always a cellular process? can you think of any exceptions?

I also have a request. If we end up merging the any cellular/non-cellular terms, can we keep the more specific cellular term ID (not a problem if we can't but lots of our internal QC checks and curation config files are pinned to cellular terms).

paolaroncaglia commented 7 years ago

Hi @ValWood,

Re. “I also have a request. If we end up merging the any cellular/non-cellular terms, can we keep the more specific cellular term ID (not a problem if we can't but lots of our internal QC checks and curation config files are pinned to cellular terms).” Unfortunately, in the very recent past merging terms and keeping the ID that should become secondary resulted in a Pandora’s box of downstream issues for various MODs (see https://github.com/geneontology/go-ontology/issues/12729 and https://github.com/geneontology/go-ontology/issues/12765 and). So I’m afraid we wouldn’t want to do that. Perhaps you could consider updating the PomBase internal QC checks and curation config files instead please, if we do go the merge route?

ValWood commented 7 years ago

OK, we can do that, but we will need a list of all the merges posting here (probably should go to the GO list anyway as its a big change, whatever the resolution). Although if the cellular distinction is preserved, the cellular and non-cellular term for most merges will have equivalent intended meaning, so the ID which is kept should be arbitrary?

https://github.com/pombase/curation/issues/1221 https://github.com/pombase/canto/issues/1292#issuecomment-266306511

paolaroncaglia commented 7 years ago

@ValWood the issues were stemming exactly from keeping what should have been the secondary ID - in this sense, it is not arbitrary which ID we keep. And yes, editors normally record details of merges on tickets. In this case, it might be advisable to copy the entire diff or save it as a file and attach it here (one can link to the online svn diffs, but they're usually very slow in loading).

ValWood commented 7 years ago

So are you saying that you no longer have secondary ID's? If so I totally missed that

paolaroncaglia commented 7 years ago

We do continue to keep secondary IDs :-)

paolaroncaglia commented 7 years ago

I'll be more precise: "so the ID which is kept should be arbitrary?" the ID which should be kept as primary is not arbitrary, and should be the one of the broader term; the ID of the narrower term, that will be merged into the broader term, will be kept as the secondary ID.

ValWood commented 7 years ago

got it. although in the case of a merge they are presumably equivalent, so I don't fully understand it, but I will take your word for it;)

suzialeksander commented 7 years ago

@jimhu-tamu

cmungall commented 6 years ago

I would like to make some more progress on this.

Summarizing some previous docs

pgaudet commented 6 years ago

I added this to the next ontology editors call.