geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
35 stars 10 forks source link

pre-import PAINT checks, PAINT IGNORING TAXON RESTRICTIONS #1873

Closed ValWood closed 5 years ago

ValWood commented 6 years ago

this batch should be taxonomically restricted:

~fhl1 forkhead transcription factor Fhl1 cell differentiation GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN001371846 PAINT_REF:11829 20170427~ see https://github.com/geneontology/go-annotation/issues/1873 ~fhl1 forkhead transcription factor Fhl1 anatomical structure morphogenesis GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN001371846~ see https://github.com/geneontology/go-annotation/issues/1873

MULTICELLULAR ORGANISM DEVELOPMENT IS TAXONOMICALLY RESTRICTED FOR YEAST

THIS TERM SHOULD BE TAXONOMICALLY RESTICTED ITS PARENTS ARE https://github.com/geneontology/go-ontology/issues/15685

THIS TERM SHOULD BE TAXONOMICALLY RESTICTED ITS PARENTS ARE

CELL MIGRATION IS ALREADY TAXONOMICALLY RESTRICTED, IGNORED BY PAINT??? GO:0040011 | locomotion | Never in Taxon | 451864

CILLIUM ASSEMBLY IS ALREADY TAXONOMICALLY RESTRICTED, IGNORED BY PAINT??? GO:0060271 | cilium assembly | Never in Taxon | 4890 | Ascomycota

ValWood commented 6 years ago

PANTHER family issue ?

~ssn6 transcriptional corepressor Ssn6 histone demethylase activity (H3-K27 specific) GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000361179
This one is a bit strange, it isn't a demethylase, but it also isn't the ortholog of the other family members in PANTHER:PTN000361179 which are. These are jmJC, and pombe ssn6 is a TRP repeat (panther family http://www.pantherdb.org/panther/family.do?clsAccession=PTHR14017~

see https://github.com/geneontology/go-annotation/issues/1951

ValWood commented 6 years ago

~tos4 transcription factor, FHA domain protein Tos4 (predicted) translation GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000559990
(Is a transcription factor)~

https://github.com/geneontology/go-annotation/issues/1952

ValWood commented 6 years ago

https://github.com/geneontology/go-annotation/issues/1953

~rmt3 type I ribosomal protein arginine N-methyltransferase Rmt3 histone arginine methylation GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000109455 PAINT_REF:11006 20170427 rmt3 type I ribosomal protein arginine N-methyltransferase Rmt3 histone-arginine N-methyltransferase activity GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000109455~

~I don't think that histone is the physiological substrate here? This is characterised in pombe as colocalizing with and methylating ribosomes see https://www.pombase.org/gene/SPBC8D2.10c~

ValWood commented 6 years ago

https://github.com/geneontology/go-annotation/issues/1962

~not enough information to make these inferences (transport inferences are dangerous across species, because substrates change but to the best of our knowledge this is a sulfate transporter) SPCC320.05 sulfate transmembrane transporter (predicted) regulation of intracellular pH GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000212047 PAINT_REF:11814 20170427 SPCC320.05 sulfate transmembrane transporter (predicted) bicarbonate transport GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000212047 PAINT_REF:11814 20170427 SPCC320.05 sulfate transmembrane transporter (predicted) chloride transmembrane transport GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000212047 PAINT_REF:11814 20170427 SPCC320.05 sulfate transmembrane transporter (predicted) oxalate transport~

cmungall commented 6 years ago

@ValWood - are all of these for the attention of the PAINT team (assigning @huaiyumi for now)? Are some for new TCs in GO?

selewis commented 6 years ago

Does this mean that the taxon checking code was removed from PAINT? It use to be built-in.

ValWood commented 6 years ago

I guess some will be for checking the PAINT annotation.

I think ssn6 is a panther family issue?

The taxon checks don't appear to be working. I think this might be a general GO issue now I think about it. Let me know if you want these 2 split out into different tickets.

ValWood commented 6 years ago

see https://github.com/geneontology/go-annotation/issues/1954

~tra2 NuA4 complex phosphatidylinositol pseudokinase complex subunit Tra2 kinase activity GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000124197~

(Tra2 in NuA4 is a pseudo-kinase, I think this is a general across species thing but I'm not sure)

ValWood commented 6 years ago

we wouldn't make these annotations:

~nup211 nucleoporin nup211 mitotic spindle assembly checkpoint GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN001052033~ https://github.com/geneontology/go-annotation/issues/1955

~nup61 nucleoporin Nup61 spindle organization GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000566604
see above~ https://github.com/geneontology/go-annotation/issues/1956

~npp106 nucleoporin Npp106 nuclear pore complex assembly GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000131983 PAINT_REF:11225~ https://github.com/geneontology/go-annotation/issues/1957

ValWood commented 6 years ago

https://github.com/geneontology/go-annotation/issues/1960

~sec21 coatomer gamma subunit Sec21 (predicted) organelle transport along microtubule GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000029440~

ValWood commented 6 years ago

see https://github.com/geneontology/go-annotation/issues/1958

~positive and negitive annotation from the sme family arg6 acetylglutamate synthase Arg6 acetylglutamate kinase activity GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000597373 PAINT_REF:23342 20170427 arg6 acetylglutamate synthase Arg6 NOT acetylglutamate kinase activity GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000597386 PAINT_REF:23342 20170427 it isn't acetylglutamate kinase activity, it's acetyl-CoA:L-glutamate N-acetyltransferase activity (this might be becaue its a multi domain protein in higher eukaryotes?) Actually there might be an amino acid kinase domain below threshold, but should there be both a positive and a negative annotation?~

ValWood commented 6 years ago

https://github.com/geneontology/go-annotation/issues/1959

~incorrect annotations See https://github.com/geneontology/go-annotation/issues/31 for some history transport, NADH to ubiquinone GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000042666 PAINT_REF:10371 SPAC11E3.12 mitochondrial thioredoxin family protein respiratory electron transport chain GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000042665 PAINT_REF:10371 20170427 SPAC11E3.12 mitochondrial thioredoxin family protein NADH dehydrogenase (ubiquinone) activity GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN000042665 PAINT_REF:10371 20170427~

ValWood commented 6 years ago

https://github.com/geneontology/go-annotation/issues/1961

~sil1 nucleotide exchange factor for the ER lumenal Hsp70 chaperone, Sil1 (predicted) cytoplasmic translation GO_Central Schizosaccharomyces pombe IBA PANTHER:PTN001062524 PAINT_REF:19316 20170427 is involved in SRP-dependent cotranslational protein targeting to membrane, translocation~

ValWood commented 6 years ago

see InterPRo summary http://www.ebi.ac.uk/interpro/entry/IPR012098

the phosphate transport is from some old 1996 genetics, indirect https://www.yeastgenome.org/reference/S000051002

(queried original annotation with SGD)

2/Apr/2018, I followed this up with SGD, the original annotation was deleted

ValWood commented 6 years ago

I scanned the first 1000 they look pretty good!) Once these are fixed I'll try a real import against our filtering (most will hopefully be duplicates so they will be filtered from PomBase as redundant).

It would be really helpful to suppress redundant annotation more generally .....

ValWood commented 6 years ago

We can't import the PAINT annotations into PomBase until these are fixed. However, the pombe annotations are now in AMiGO. Because they have the incorrect gene ID in the db object symbol column they are also causing problems for downstream softwares (see helpdesk ticket).

selewis commented 6 years ago

Right, I forwarded your message to Anushya and Huaiyu to draw their attention to this.

On Mon, Apr 2, 2018 at 12:48 PM, Val Wood notifications@github.com wrote:

some of these will need to be addressed by PAINT curators.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-annotation/issues/1873#issuecomment-378023916, or mute the thread https://github.com/notifications/unsubscribe-auth/ABcuENlWRWf_h3HX67at8OT5PDTL15JXks5tkoCOgaJpZM4So_9y .

ValWood commented 6 years ago

Other than the taxon restriction issue I think they are mostly annotation problems. ....

ValWood commented 6 years ago

@pgaudet this is an earlier batch. Do you want these into separate tickets?

pgaudet commented 6 years ago

@ValWood

Please create separate tickets. Taxon constraints should go in the go-ontology tracker.

Thanks !

Pascale

mah11 commented 6 years ago

Taxon constraints should go in the go-ontology tracker.

Is that applicable to these reports? As I understand it, the taxon constraints are already present in the ontology, but annotations that should be flagged or blocked are getting through. Is that still an ontology issue?

ValWood commented 6 years ago

yes I think in many cases the taxon restrictions are present but the taxon restrictions are not in place. I have asked about this in multiple tickets. @cmungall ?

anyway I'll start by separating this list out.

ValWood commented 6 years ago

ticket about taxon checks (and other QC checks) https://github.com/geneontology/go-annotation/issues/1928

ValWood commented 6 years ago

There are 2 issues really......

1 . Why aren't the taxon restrictions being included more generally (i.e when a gene products is annotated to a taxonomically restricted term, why isn't it flagged in the logs.

but

  1. Why is PAINT ignoring the taxon restrictions and making annotations to them For example, pombe and cerevisiae should not get any annotation via PAINT to "multicellular organism development"

Both of these problems need resolving. The taxon checks not working issue is already reported to @cmungall

The entries in this ticket related to taxon checks is about PAINT ignoring taxon restrictions....

ValWood commented 6 years ago

It seems to be a mixture. I renamed this ticket pre-import PAINT checks, PAINT IGNORING TAXON RESTRICTIONS

I'll move any queries that do not appear to be this particular problem to separate tickets

pgaudet commented 6 years ago

@ValWood The fact that taxon restrictions are ignored for PAINT annotations is a known issue. So you can skip creating new tickets if this is the problem. I'm investigating this with @cmungall @huaiyumi and @dougli1sqrd to see why some PAINT annotations are not exported/displayed correctly.

Thanks, Pascale

ValWood commented 6 years ago

OK this particular ticket is now only about PAINT ignoring taxon restrictions.

pgaudet commented 6 years ago

Thanks !

ValWood commented 6 years ago

If it helps, taxon restrictions are being ignored for everything, and have been for a while. Not just for PAINT (although I had assumed that there would be an extra upstream step to prevent PAINT annotations being created if a taxon restriction exists?)

ValWood commented 6 years ago

GO:0060271 | cilium assembly | Never in Taxon | 4890 | Ascomycota we get 8 pombe genes annotated to this term

via PANTHER:PTN000430231 PANTHER:PTN000223362 PANTHER:PTN000223485

selewis commented 6 years ago

There ought to be (and indeed use to be) at least two built-in checks for passing the taxon constraints. First during PAINT handling and second during the standard pipeline

  1. Built-in to PAINT itself so that the annotations aren't created in the first place. 3 possible things could happen at this stage. 1.a code-rot resulting in this functionality being lost. -- @huaiyumi would you please check that. -- 1.b older PAINT annotations prior to current checks, but these could be caught when the PAINT gafs are generated by @huaiyumi, again @huaiyumi to check-- and last 1.c the taxon checker service that PAINT uses is down -- @kltm or @dougli1sqrd can determine that.

  2. The pipeline should be checking all submissions to ensure they obey the taxon constraints. This would deal with any failures in the above. Really though these should ideally be caught at inception

kltm commented 6 years ago

Moved to implementation ticket here: https://github.com/geneontology/go-site/issues/758

pgaudet commented 6 years ago

We should leave this ticket open as it provides nice examples to do QC.

kltm commented 6 years ago

@pgaudet If they are to be implemented, they should go with the ticket; if they are documentation, the wiki may be a more appropriate place.

pgaudet commented 5 years ago

I think this is all done. Please reopen if not.