geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

DNA binding transcription factors. MAJOR ISSUE #602

Open ValWood opened 4 years ago

ValWood commented 4 years ago

When I try to search for all DNA-binding transcription factors

this annotation is the first in the list

URS00001B341F_9606 | Homo sapiens (human) hsa-miR-27a-5p |   | negative regulation of NF-kappaB transcription factor activity | has input UniProtKB:Q04206occurs in aortic endothelial cell | BHF-UCL | Homo sapiens | IGI | UniProtKB:P01584 |   | miRNA |   | PMID:25327529 | 20191118

even though this search is pinned to "regulates_closure: GO:0003700"

should this not mean thay "regulation of molecular function" process-function relationships are not shown?

The current behaviour makes it impossible to retrieve a list of genes annotated to a specific molecular function

@cmungall @kltm @RLovering

(I was trying to do the query for the systems biology lab I am associated with)

pgaudet commented 4 years ago

Maybe this is an AmiGO issue ?

kltm commented 4 years ago

@ValWood I'm assuming that this is the annotation in question above? http://amigo.geneontology.org/amigo/search/annotation?q=URS00001B341F_9606&fq=regulates_closure:%22GO:0003700%22&sfq=document_category:%22annotation%22

Looking at the graph: http://amigo.geneontology.org/visualize?term_data_type=string&inline=false&mode=amigo&term_data=GO%3A0032088&format=png It does appear to be in the closure.

ValWood commented 4 years ago

but the setting is the default which is

− | regulates_closure: GO:0003700

(minus regulates)

in the past the default (- regulates closure) always worked to retrieve lists of actual molecular functions.

kltm commented 4 years ago

Okay, flipping the regulates closure filter from "+" to "-", we get: http://amigo.geneontology.org/amigo/search/annotation?q=URS00001B341F_9606&fq=-regulates_closure:%22GO:0003700%22&sfq=document_category:%22annotation%22

That brings up a longer list. Looking at the first one:

RNAcentral URS00001B341F_9606 URS00001B341F_9606 GO:0010629 PMID:25327529 IGI UniProtKB:P01584 P Homo sapiens (human) hsa-miR-27a-5p miRNA NCBITaxon:9606 20191118 BHF-UCL ENSEMBL:ENSG00000007908|CL:0002544

It is annotated to GO:0010629. Looking at that in the graph: http://amigo.geneontology.org/visualize?inline=false&term_data_type=string&mode=amigo&term_data=GO%3A0010629&format=png Looking at this, it seems that GO:0003700 is not in the closure, which is what would be expected in the return.

ValWood commented 4 years ago

I'm getting very confused.

All I know is that URS00001B341F_9606 is NOT a transcription factor, and so it is very, very confusing for users to see it in the transcription factor list.

ValWood commented 4 years ago

I think i know the problem.

Many years ago I pointed out that "molecular function regulators" were annotated to the molecular function transitivley. The − | regulates_closure: GO:0003700 was added as a work around for this.

Since then regulation of molecular function GO:0065009 links have been added to the functions. This has the same effect and biological processes are now transitively annotated to molecular functions.

so via the process GO:0032088 negative regulation of NF-kappaB transcription factor activity

becomes transitively annotated to "transcription factor activity"

@pgaudet This is another really really good reason not to have "BP regulation of MF terms in GO"

This inheritance is going to confuse users massively (and have a large effect on analysis)

It will be a shame if, after all of the effort to annotate the transcription factors, the list is not possible to retrieve.

The current list annotated to regulates_closure: GO:0003700 for human is 1584, but this includes at least 9 non-coding RNAs ( There are probably many more, I only spotted these because they are right there at the front of the list).

There is then no way in AmiGO to filter out the "BP regulation of GO:0003700" without losing valid annotations (at least in a way that is accessible to the average GO user).

This is a BIG problem for retrieving accurate molecular function annotation lists.

ValWood commented 4 years ago

The label here shouldn't be "question" . This is a major usability issue.....

kltm commented 4 years ago

Apologies for the "question" label--I had it there as we were going through whether with was an "AmiGO" issue or an "ontology" issues (and then move to another tracker).

Tagging @pgaudet @cmungall

cmungall commented 4 years ago

I may need a slight restatement of the desired behavior (independent of amigo, and of any past behavior)

ValWood commented 4 years ago

This is my understanding of the issue:

Screenshot 2020-08-10 at 19 00 39

Annotations to descendants of GO:0003700 DNA-binding transcription factor activity in the Molecular function ontology (i.e the MF regulator terms)

i.e.

Term ID Term name Remove
GO:0003700 DNA-binding transcription factor activity  
GO:0001217 DNA-binding transcription repressor activity  
GO:0001216 DNA-binding transcription activator activity  
GO:0140416 DNA-binding transcription factor inhibitor activity

are not propagated to GO:0003700 DNA-binding transcription factor activity

This is the current "regulates" exclusion. This is good because these are not GO:0003700 DNA-binding transcription factor activity

However, now there is a F-P link between GO:0051090 | regulation of DNA-binding transcription factor activity and GO:0003700 DNA-binding transcription factor activity

so things like URS00001B341F_9606 | Homo sapiens (human) hsa-miR-27a-5p | | negative regulation of NF-kappaB transcription factor activity | has input UniProtKB:Q04206occurs in aortic endothelial cell | BHF-UCL | Homo sapiens | IGI | UniProtKB:P01584 | | miRNA | | PMID:25327529 | 20191118

get annotated to GO:0003700 DNA-binding transcription factor activity

This would not be very clear to a user how to fix this to obtain a list of DNA binding transcription factors. I only spotted them by chance because they are at the top of the list and I saw that they are miRNA's

Does that help?

cmungall commented 4 years ago

Thanks!

Not sure about this bit:

However, now there is a F-P link between GO:0051090 | regulation of DNA-binding transcription factor activity and GO:0003700 DNA-binding transcription factor activity

The link here is actually regulates

ValWood commented 4 years ago

Yes, the regulation is regulates, but he annotation is still inherited across the F-P connection. The corresponding MF-MF example doe s not work here as an illustration.

This is a better example

Screenshot 2020-08-24 at 14 26 52

The "protein kinase regulator " MF term is connected "protein kinase activity" MF term via a regulates link, but the "protein kinase" term does not include these 'regulation' annotations for MF-MF connection.

However, for a regulation annotation between BP and MF the annotations are currently transitive.

pgaudet commented 4 years ago

Discussion on the ontology call: we agree that it is not intuitive for users to get the regulates closure by default. I'll create an AmiGO ticket to ask to change the default behavior.

lpalbou commented 4 years ago

Do we want that behavior to be updated also on the ribbon ? By default, it also includes the regulates closure, but surely we want AmiGO and the ribbon to be consistent.

Note this is not a minor change as from memory, a large number of annotations are mapped through the regulates relationship, so if we change that behavior, a proper announcement on the mailing lists would certainly help.

ValWood commented 4 years ago

The number of annotation changes should not be so large, this behaviour has always been in place for MF-MF (i.e. a "MF protein kinase regulator" is currently not annotated to "MF protein kinase" in Amigo.

This only affects links between "BP regulation of x" and "MF x" We do not have many such terms, and they do not have many annotations.

We have also talked about removing these "regulation of molecular function" terms from the BP ontology (there is a ticket about this somewhere)

ValWood commented 4 years ago

Actually it is quite a big problem. Human now has 1248 gene products annotated to "protein kinase activity". human has ~500 protein kinases.

This is REALLY misleading and quite embarrassing really ...

Note that for regulates transitivity BP -> BP -> OK (current behaviour OK) BP ->MF not OK (current behaviour misleading) MF-> MF not OK (current behaviour OK)

cmungall commented 4 years ago

@ValWood thanks, I get it and I agree

The only part I am confused about is your statement "this behaviour has always been in place for MF-MF (i.e. a "MF protein kinase regulator" is currently not annotated to "MF protein kinase" in Amigo" - AFAICR the behavior in amigo is the same as in all tools which is to propagate over is-a, part-of, and regulates relations regardless of the aspect

For example, on http://amigo.geneontology.org/amigo/term/GO:0004672 if I pin to human and IBA I can see

image

Same for quickgo

(as a further confusing aside, I can't figure out why quickgo has more human annotations to this term, if different propagation rules are applied.. we really need to be more explicit in our inference across all tools)

Regardless, this is not AmiGO specific, although as our main browser we should strive to lead by example. Same for ribbon and enrichment and the Alliance. But we need consistent behavior across all tools which requires clear documentation, good policing to ensure tools and databases we endorse follow these common guidelines, and implementation.

There are two equivalent ways to state the correct behavior. The first way:

The other equivalent way which I prefer but may be less intuitive for how people are usually indoctrinated think about GO, although I'd argue it is actually more biologist friendly. Here we do not think about 'annotations' but instead think in terms of gp2term relations

In all cases there is no need to specify ad-hoc rules. The property chains are in RO, and are imported into go-plus.

We were meant to talk about this on the managers call today but didn't have time, it's on the agenda for the software call tomorrow

cmungall commented 4 years ago

Looks like we are repeating ourselves: #210

pgaudet commented 4 years ago

Looks like we are repeating ourselves

Or we are consistent.

ValWood commented 4 years ago

For example, on http://amigo.geneontology.org/amigo/term/GO:0004672 if I pin to human and IBA I can see

It's complicated I think this is because CAMK2N1 also has

CAMK2N1 Calcium/calmodulin-dependent protein kinase II inhibitor 1      negative regulation of protein kinase activity      GO_Central  Homo sapiens    IBA MGI:MGI:1913509 PANTHER:PTN001256416    calcium/calmodulin-dependent protein kinase ii inhibitor 2 pthr31007    protein     PMID:21873635   20200618

so although historically the inference was not made from MF to BP, the MF-BP link instantiates a process annotation, and this link creates the transitive annotation to "protein kinase activity'

We definitely discussed the incorrectness of inferring 'regulates' over M F annotations historically, and this is why the default Your search is pinned to − | regulates_closure: GO:0004672 exists for MF terms.

I will try to find the tickets about discussion and implementation (it was many years ago!), AND find an example of a MF regulator term with no F-P link to confirm this behaviour.

ValWood commented 4 years ago

I see you found the tickets https://github.com/geneontology/amigo/issues/267 and #210

ValWood commented 4 years ago

I think I found confirmation: http://amigo.geneontology.org/amigo/term/GO:0010698 The pombe gpa2 "adenylate cyclase activator activity" http://amigo.geneontology.org/amigo/gene_product/PomBase:SPAC23H3.13c does not get an annotation to "adenylate cyclase activity" because there is no P-F regulates "adenylate cyclase" link https://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0010856

ValWood commented 4 years ago

and http://amigo.geneontology.org/amigo/gene_product/PomBase:SPCC737.06c

glutamate-cysteine ligase regulator activity has no annotation to glutamate-cysteine ligase activity

ValWood commented 4 years ago

So, in summary this type of annotation only appears when a P-F link is present: protein kinase regulator (MF) -> regulation of protein kinase (BP) -> regulates "protein kinase" results in the propagation of annotation from protein kinase regulator (MF) to protein kinase (MF) but if no F-P link existed the propagation would not occur.

colinlog commented 4 years ago

Coming from the TF annotation effort and Val's reference to this ticket in https://github.com/geneontology/go-ontology/issues/19894, I wonder whether some BPs should become MFs? For example Transcription factor and its descendants. But that may not help the 'list pollution' with upstream regulators reported above by Val ? I would be glad to listen in and perhaps even contribute to a live re-discussion of these historical discussions. Has someone ever written about these issues in a more formal format?

ValWood commented 4 years ago

@colinlog I think you refer to

BP regulation of DNA-binding transcription factor activity https://www.ebi.ac.uk/QuickGO/term/GO:0051090 and descendants should be a molecular function term? (this is essentially what I was suggesting, I think curators and users are very confused by 'regulation of MF in the process ontology- to me these should. be annotated as 'BP regulation of process' OR MF function regulator activity"

The number of EXP annotation to this term (and descendants) is relatively low considering (it is 834)

The majority seem to be used for the annotation of upstream signalling pathway components i.e. positive regulation of NF-kappaB transcription factor activity shouldn't this just be NIK/NF-kappaB signaling which is defined as activating NF-Kappa B?

Some are functions (i.e cytoplasmic sequestering of NF-kappaB), but we now have sequestering functions too.

Of course, this is only a part of the picture, there are many other 'regulation of MF' in the BP ontology outside of those related to transcription.

ValWood commented 4 years ago

To indicate how un-useful these regulation of function annotations are, we often see this situation:

Screenshot 2020-09-01 at 20 59 07

the "regulation of MF" process annotation is totally uninformative if a MF regultor annotation is present.

pgaudet commented 6 months ago

@ValWood Is this still a problem?

ValWood commented 6 months ago

Well the problem seems to have largely gone away, but that might be because we have removed (for fission yeast) all of the BP "regulation of MF" terms that were causing the problem.

So even if the issue isn't fully resolved for all species, it will be soon.....so this can probably close?

As an aside , "regulation of molecular function" has only 21,766 annotations in QuickGO but 12903 of them are TreeGrafter