geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
223 stars 40 forks source link

MF refactoring: edits to DOS's changes #14225

Closed pgaudet closed 4 years ago

pgaudet commented 7 years ago

Hello,

@thomaspd and I worked on @dosumis 's branch of the MF refactoring and made further edits. @dosumis would you please have a look, and it that works for you we'll merge this and keep on editing from there.

see https://github.com/geneontology/go-ontology/commit/1f9572f9c4f7b8d61c57a8a38ca9848d315964c1

Thanks, Pascale

dosumis commented 7 years ago

Hiya,

Do you have a link to a pull request with all changes in?

Cheers, David

dosumis commented 7 years ago

OK Got it. I thought you'd branched again, but it seems to be on the same pull request (https://github.com/geneontology/go-ontology/pull/14226), with commits starting from:

https://github.com/geneontology/go-ontology/pull/14226/commits/d50230177afae4da14aee3f2e34f650fe35efe08

I'll comment properly later (I was surprised to see 'role' back - as I thought we'd agreed not to use that. MF terms are mini-processes, which are rather different from roles.)

CC @cmungall

cmungall commented 7 years ago

I agree the term 'role' will cause confusion

ValWood commented 7 years ago

"system component function" will be meaningless to biologist. Do these MFs really need a grouping term other than "molecular function"?

pgaudet commented 7 years ago

Hi,

Perhaps we can do without these new top level classes ? Here's what the top level of MF now looks like. The term with blue background probably could be moved somewhere better. If we can remove some of these highlighted terms, then the top level is small enough that more grouping classes seem unnecessary.

Ideas:

  1. Create 'transcription factor activity' as parent of 'nucleic acid binding transcription factor activity' (perhaps as a grouping, 'do not annotate' term'?) and 'transcription factor activity, protein binding'
  2. Translation regulator activity will probably go, see https://github.com/geneontology/go-ontology/issues/13536
  3. I wonder if 'nutrient reservoir activity' really is an activity (it's certainly passive!). There are only about 20 annotations. Seems like these storage proteins are the target of a storage (and later utilization) process; but do they actively mediate this ? @tberardini would there be a better way to describe this ?
  4. 'toxin activity' also deserves to be reviewed
image

Thoughts @thomaspd @ValWood @cmungall @dosumis @ukemi @vanaukenk ???

Thanks, Pascale

tberardini commented 7 years ago

Re: 'toxin activity', please see #12766 which documents this term's recent revival and history.

tberardini commented 7 years ago

Re: 'nutrient reservoir activity' I see this as very similar to 'structural molecule activity' in terms of it being a passive function.

thomaspd commented 7 years ago

I agree that we don't need terms above these, at least not for now.

Re 1 (transcription factor classes), we'd suggested a higher lever term called transcription regulator activity.'

Re 2 (translation factor) I don't think it needs to be obsoleted right away, if at all.

Re 3 and 4, I think these are OK for now. About nutrient reservoir, I can't think of a better way to describe egg proteins, or milk proteins. About toxin activity, it's an accepted term for a protein that evolved as a secreted toxin.

Let's merge in David's changes now so we don't accrue too many conflicts before making additional changes.

ValWood commented 7 years ago

I still think it is totally confusing for curators and users to need to select terms in 2 MF branches to represent TFs fully,

a "transcription factor" branch e.g. GO:000370 transcription factor activity, sequence-specific DNA binding

and a "DNA binding" branch eg GO:0000977 - RNA polymerase II regulatory region sequence-specific DNA binding

I don't see how the "regulation of transcription branch" differs from a process, and the term names are only subtly different.

Even after a few years of using I still need to go back to look at the ontology every time I use one. I do a consistency check every few months to make sure our TFs are still annotated in both branches and there is usually a little drift due to the confusion even for experienced curators.

When you look at the high level TF terms do you know which is the "DNA binding" branch and which is the "transcription factor activity" branch?

It would be much simpler if we could select a single MF (DNA or protein bindingTF term) and part_of BP "transcription/regulation of transcription....."

This is one of the key terms to describe the MF of a DNA binding TF (DNA binding to a specific promoter region) http://www.ebi.ac.uk/QuickGO/term/GO:0000978 and it is not related to any of the "transcription factor" high level MFs (which are only describing processes).

pgaudet commented 7 years ago

Changes:

cmungall commented 4 years ago

Just adding a note for posterity, since this term links to this ticket:

id: GO:0140096
name: catalytic activity, acting on a protein
namespace: molecular_function
def: "Catalytic activity that acts to modify a protein." [GOC:molecular_function_refactoring, GOC:pdt]
is_a: GO:0003824 ! catalytic activity

We don't have a logical definition for this, so this means terms will have to be manually classified here.

It also means we can't auto-infer annotations, if curators want to annotate to an enzyme mechanism that is acting on a protein, then either we need to instantiate protein-specific subclasses for all appropriate activities and train curators to use these subclasses OR train curators to co-annotate. We'd want to do this retrospectively to ensure reasonable annotation completeness.

deustp01 commented 4 years ago

Opening a whole new can of worms here three years too late, does it make sense to have top-level terms "acting on DNA / RNA / protein"? It might be better to have "acting on polypeptide / polynucleotide" with the latter having ribo- and deoxyribo- children. That would fit a lot better with the enzymology data focused on the active sites and molecular mechanisms of enzymes and indifferent to the size of the substrate molecule or whether the substrate in genome-encoded or not. This isn't an argument that enzymes don't act on whole proteins, only that such enzymes functionally are specialized children of ones that go for a peptide bond or an amino acid side chain, indifferent to how big the molecule containing it is. @ukemi @hdrabkin

ValWood commented 4 years ago

Randomly, here are some terms which are "acting on a protein" but do not have the parentage in a file that I found on my desktop.

holocytochrome-c synthase activity (GO:0004408) Catalysis of the reaction: holocytochrome c = apocytochrome c + heme.

deoxyhypusine monooxygenase activity (GO:0019135) Catalysis of the reaction: protein N6-(4-aminobutyl)-L-lysine + donor-H2 + O2 = protein N6-((R)-4-amino-2-hydroxybutyl)-L-lysine + acceptor + H2O.

peptide-lysine-N-acetyltransferase activity (GO:0061733) Catalysis of the reaction: acetyl-CoA + lysine in peptide = CoA + N-acetyl-lysine-peptide.

lipoyl(octanoyl) transferase activity (GO:0033819) Catalysis of the reaction: octanoyl-[acyl-carrier protein] + protein = protein N6-(octanoyl)lysine + acyl-carrier protein.

dolichyl-phosphate-mannose-protein mannosyltransferase activity (GO:0004169) Catalysis of the reaction: dolichyl phosphate D-mannose + protein = dolichyl phosphate + O-D-mannosylprotein.

ubiquitin-like modifier activating enzyme activity (GO:0008641) Catalysis of the activation of small proteins, such as ubiquitin or ubiquitin-like proteins, through the formation of an ATP-dependent high-energy thiolester bond.

ValWood commented 4 years ago

Reopening because the logical def would fix this?

ValWood commented 4 years ago

I reopened this but it can probablly close? If it stays open it is only for a logical defs.