geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
223 stars 40 forks source link

Ontology QA suggestions #6227

Closed gocentral closed 9 years ago

gocentral commented 15 years ago

As a result of the analysis described in the ISMB paper http://bioinformatics.oxfordjournals.org/cgi/content/full/btp195?ijkey=qvYcpVnIJMjd19Y&keytype=ref "Ontology quality assurance through analysis of term transformations" we have identified groups of terms in the GO that are not "univocal", where some rephrasing is warranted.

Some general observations:

in general, all "the" "an" and "a" determiners can be removed in general look for terms with "or" or "and" and eliminate the coordination in favor of 2 specific terms (or eliminate completely -- in some cases it appears to be irrelevant) look at use of punctuation, in particular "," which in some cases seems to be used in place of a specific relational preposition consider use of "within" vs "in" ** generally it seems that "other" or "another" is not necessary

The clusters that should be looked at are in the attached file. I'd appreciate knowing which of these turn out to be "true" true positives that result in some change in phrasing of the GO terms.

Thanks, Karin Karin.Verspoor@ucdenver.edu

Reported by: verspoor

Original Ticket: geneontology/ontology-requests/6246

gocentral commented 15 years ago

Lists of clusters that may require some rephrasing.

Original comment by: verspoor

gocentral commented 15 years ago

While most of the suggested changes are indeed trivial, I am going to speak up about two of the immunology related ones:

[136] TP:"X of Y" vs. "Y X" / ALTERNATION 111 {CTERM GTERM antigen} (7 terms) GO:0002494: lipid antigen transport GO:0015433: peptide antigen-transporting ATPase activity GO:0046968: peptide antigen transport GO:0048002: antigen processing and presentation of peptide antigen GO:0002585: positive regulation of antigen processing and presentation of peptide antigen GO:0002584: negative regulation of antigen processing and presentation of peptide antigen GO:0002583: regulation of antigen processing and presentation of peptide antigen

In this case the main process term is 'antigen processing and presentation'. The structure here is parallel to the other 'antigen processing and presentation' terms such as 'antigen processing and presentation of endogenous antigen'. I would not care if 'peptide antigen transport' became 'transport of peptide antigen', but I object to 'peptide antigen processing and presentation'. I already put a lot of careful effort into how to phrase the set of antigen processing terms and would prefer to stick to them as they already stand.

[141] TP: non-parallel structure 111 {GTERM activat} (9 terms) GO:0001905: activation of membrane attack complex GO:0001775: cell activation GO:0002253: activation of immune response GO:0051488: anaphase-promoting complex activation **GO:0050798: activated T cell proliferation GO:0051522: activation of monopolar cell growth GO:0051519: activation of bipolar cell growth GO:0002218: activation of innate immune response GO:0032397: activating MHC class I receptor activity *** Note: should be "activation of T cell proliferation"

Note: Should not be "activation of T cell proliferation!" The process here is the proliferation of previously activated T cells, not the activation step of the T cells. The software is not sophisticated enough to understand the distinction.

Thanks,

Alex

Original comment by: addiehl

gocentral commented 15 years ago

Perhaps not suprisingly, I recognised some of these from the OBOL no-parse list for multi-organism process (MOP) terms. Here are my comments on the MOP suggestions:

[1] TP: {host X} vs. {X in host} [27] TP: "host X" vs. "X in host"

[25] TP: "in or on host organism" vs. "in another organism"

[64] TP: "cell wall of other organism" vs. "cytoskeleton in other organism" / "OF" vs "IN"

[67] TP: word choice; "within other organism" vs. "in other organism

[88] TP: "host X levels" vs. "levels in host"

[188] TP: "of X in Y" vs. "of Y X" / ALTERNATION

Original comment by: jl242

gocentral commented 15 years ago

Regarding: [25] TP: "in or on host organism" vs. "in another organism" GO:0043707: cell adhesion during single-species biofilm formation in or on host organism GO:0051672: cell wall peptidoglycan catabolic process in another organism GO:0044401: multi-species biofilm formation in or on host organism GO:0044407: single-species biofilm formation in or on host organism

The summary should probably read "in or on" vs "in". The question is whether "in or on" is more informative than just "in", or whether the "in or on" terms should be broken up into separate "in" and "on" terms if the location distinction is important.

Please do keep in mind these are simply phrasing inconsistencies identified by the system, rather than explicit requests for changes. Whether or not the inconsistencies are even "real" is clearly left up to your expert judgment. You all have far more context than the system, or even me!

Kind regards, Karin

Original comment by: verspoor

gocentral commented 15 years ago

Ah right. In fact I think in this case "in or on" is more informative that simply "in" because these processes include instances where the process occurs both within and on the surface of the host. Unfortunately there's no nice way of expressing this location in English!

Original comment by: jl242

gocentral commented 15 years ago

Okay - made the following changes:

[1] TP: {host X} vs. {X in host} [27] TP: "host X" vs. "X in host"

[64] TP: "cell wall of other organism" vs. "cytoskeleton in other organism" / "OF" vs "IN"

[67] TP: word choice; "within other organism" vs. "in other organism

[88] TP: "host X levels" vs. "levels in host"

[188] TP: "of X in Y" vs. "of Y X" / ALTERNATION

Original comment by: jl242

gocentral commented 14 years ago

Many corrections were made around when this item was submitted, and I've gone through and done the rest. Exceptions are listed, with explanations, in the new attached file.

m

Original comment by: mah11

gocentral commented 14 years ago

Original comment by: mah11

gocentral commented 14 years ago

Original comment by: mah11

gocentral commented 14 years ago

notes on suggested changes that weren't made

Original comment by: mah11