geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
33 stars 10 forks source link

Review annotations to co-localizes_with GO:0032991 protein-containing complex & children #1940

Open pgaudet opened 6 years ago

pgaudet commented 6 years ago

The Protein Complex Working group has decided that annotations to GO:0032991 protein-containing complex & children should NOT be allowed with the colocalizes_with qualifier, see #1500.

There are quite a few EXP annotations (non EXP manual are below).

The review spreadsheets are sorted by group, and then by year. https://docs.google.com/spreadsheets/d/1TPSWpRTqElK0f05IzRjYSXiBVC0Bh98e5wfFH2vt190/edit#gid=0

Group # annotations
UniProt + UniProtKB 482
SGD 267
FlyBase 81
BHF-UCL 69
ParkinsonsUK-UCL 30
ARUK-UCL 28
WB + WormBase 24
MGI 21
dictyBase 16
RGD 10
CAFA 7
HGNC 7
AspGD 4
PomBase 4
AgBase 2
CACAO 2
CGD 2
GO_Central 1

Here are the manual, non-EXP annotations:

https://docs.google.com/spreadsheets/d/1hanprhIk5PJ8VQeUeGVv8G-S8gJKSJPgRnVatorQYIs/edit#gid=0

Group # annotations
UniProt 485
BHF-UCL 25
ARUK-UCL 12
GO_Central 12
ParkinsonsUK-UCL 9
AgBase 5
HGNC 5
dictyBase 5
PomBase 1
SGD 1
WB 1
pgarmiri commented 5 years ago

From my experience, dealing with similar amount of annotations, I found that I did have to go back to at least the abstract of the paper. Some cases were as you said @ValWood ; proteins were part of the complex, so the qualifier was just to be deleted. I did have to request a new complex term to be created for one of the cases; that was more time consuming. But other cases were using the colocalization with the complex as a marker for the subcellular location. Or the protein was binding to the complex, so the relevant protein complex binding had to be used instead.

The qualifier was used to capture several aspects it seems for a very long time, so I think it would be impossible to automatically update the current annotations without reviewing. Each case is different and some are easier than others. Penelope

pgaudet commented 5 years ago

I just created a rule for this.

pgaudet commented 5 years ago

For the proteins that are actually part of a complex, are they often annotated twice, both with and without the qualifier ? We could perhaps query that ?

Pascale

ValWood commented 5 years ago

For the proteins that are actually part of a complex, are they often annotated twice, both with and without the qualifier ? We could perhaps query that ?

That's a good idea, I suspect that they would often be (especially allowing all evidence codes)

bmeldal commented 5 years ago

Agreed in call on 24/1/19 (in https://github.com/geneontology/go-annotation/issues/1500):

Use of colocolizes_with GO:0032991 protein-containing complex and children no longer allowed

Actions required:

Use of colocolizes_with [subcellular location]:

alexsign commented 5 years ago

@bmeldal Hi Birgit, few questions before I implement new sanity check to Protein2GO It's not just a warning, we want to prevent curators to adding new annotation like this, right?
Does this applies to all gene products(proteins, complexes and RNAs)? Any special treatment for NOT|colocalizes_with or leave it as is?

sylvainpoux commented 5 years ago

Hi @alexsign and @bmeldal we (UniProt) have used the co_localizes with protein complexes for many years, following IntAct recommendations, and we therefore have many of these cases. It would be good to find semi-automatic ways to clear these cases, because coming back to articles is not an efficient solution I would therefore prefer if monthly reports from protein2GO would not contain hundreds of errors with this co_localizes with before we agree on a solution to clean this set Thanks Sylvain

bmeldal commented 5 years ago

@alexsign

It's not just a warning, we want to prevent curators to adding new annotation like this, right?

The P2GO users wanted a warning - at least for starters - rather than a strict blocking of the usage.

Does this applies to all gene products (proteins, complexes and RNAs)?

Yes.

Any special treatment for NOT|colocalizes_with or leave it as is?

Don't know as I never annotate NOT. @ValWood @vanaukenk ?

@sylvainpoux That's why we said we need to find a strategy for the fixing of existing annotations. Something to discuss in Cambridge in April. I have no idea what the IntAct recommendation was, as we only annotate PPIs that would have full binding evidences. It must predate my time.

ValWood commented 5 years ago

Hi Sylvain,

You could probably purge the non-experimental ones. For the experimental, you don't have so many. There are only 379 assigned by UniProt with "experimental" evidence code.

For many of these they are duplicates: uniprot2

Many could probably just be deleted if there are already CC annotations describing the cellular localization, or an IPI describing the protein binding to a complex member.

Also, there is no hard deadline for this task, just a warning, and a commitment to re-train curators know that new annotations of this type should not be created (actually I think @alexsign will make it impossible in Protein2GO).

Hopefully this type of warning will be in a separate log file from the things which need fixing more urgently. We should check in to that....

ValWood commented 5 years ago

Any special treatment for NOT|colocalizes_with or leave it as is? Don't now as I never annotate NOT. @ValWood @vanaukenk ?

I would say definitely don't allow NOT with colocalizes_with

bmeldal commented 5 years ago

Actions from Cambridge GOC mtg on 11/4/19:

From https://github.com/geneontology/go-annotation/issues/1500:

bmeldal commented 5 years ago

There are still lots of annotations from several groups that haven't been updated yet:

EXP:

non-EXP:

srengel commented 5 years ago

re the SGD clean-up, we only have 46 annotations left to review, but have not updated the spreadsheet. Edith is working on the annotations, i'll work on updating the 200+ missing updates on the spreadsheet.

ValWood commented 5 years ago

? work through meaningless annotations like "cell" where colocolizes_with [subcellular location]? [new ticket?]

We should just block the term and let people deal with them if they want to ....

https://github.com/geneontology/go-annotation/issues/2325

srengel commented 5 years ago

@marekskrzypek there is a CGD annotation to review.

marekskrzypek commented 5 years ago

CGD done

BarbaraCzub commented 5 years ago

ARUK-UCL and ParkinsonsUK-UCL all done.

cc @RLovering

edwong57 commented 5 years ago

SGD is all done with this review

vanaukenk commented 5 years ago

WB is all done now, too.

hattrill commented 3 years ago

Still quite a lot hanging about - would nice to get this all sorted especially with new GAF2.2 qualifiers on the horizon https://github.com/geneontology/go-annotation/issues/2917 Screenshot 2020-10-23 at 16 18 15

pfey03 commented 3 years ago

the two for Dicty that I wanted to keep I've now updated and done

hdrabkin commented 3 years ago

I believe all of the co-localizes for MGI might have been removed/changed for different reasons just recently. Don't know when fixes would show up. Checking yes; if an MGI curator made the annotation, the co-localizes is now gone. Anything left comes in as loads so they will be gone when it's gone from the source.

hattrill commented 3 years ago

Ace! Thanks @hdrabkin

RLovering commented 3 years ago

I think all UCL done

suzialeksander commented 6 months ago

Looks like all remaining ones are UniProt, a few IDA but mostly ISS @sylvainpoux