geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
31 stars 10 forks source link

question about transposons and dubious genes (clarification required) #4606

Closed ValWood closed 4 months ago

ValWood commented 1 year ago

Should we submit transposon annotations to GO? Should we submit dubious proteins to GO?

This question arose because of discrepancies making species comparisons. It would be especially useful to have rules about transposons, and if it is decided that they should be included, to be able to identify and filter them easily (the transposon complement of even closely related species can be very different). I would argue that transposons should not be included because it makes comparative analysis between species difficult (and also they are not really an endogenous function of the organism, and are usually inactive). However, I have included gene drivers in the GO submission because these are operating and have phenotypic consequences)

pgaudet commented 1 year ago

What do you think, @vanaukenk , could we discuss this at an annotation call ?

Groups that make annotations to transposons could explain why ?

pgaudet commented 1 year ago

@ValWood normally these should be labeled with entity type = trasnposon, to allow excluding them as needed. Could one solution be to improve the way contributing groups label their entities?

ValWood commented 1 year ago

Possibly- if they are useful to include in GO.

Currently "transposable_element_gene" is used once for one gene (arabidopsis), "transposon" is not used at all. However this should still be included by a filter on 'gene' if the SO heirarchy is observed because this has is_A to gene.

We also need to ensure that domesticated transposons are not excluded (these function as genes).

ValWood commented 11 months ago

Can we put this on the agenda?

pgaudet commented 11 months ago

@ValWood There are 2 IEAs to TAIR https://amigo.geneontology.org/amigo/gene_product/AGI_LocusCode:AT1G65750 Do we need to discuss this? It seems a very minor issue.

ValWood commented 11 months ago

It can probably close. It arose because I was doing a comparison with S. cerevisiae and the gene numbers didn't add up because transposable elements were included.

If I look in Amigo for the number of S. cerevisiae proteins I get 6045, but SGD reports only 5930 (the correct number, i.e. 5258+672 https://www.yeastgenome.org/genomesnapshot The difference of >100 may not seem many but it is important if you are trying to identify the "unknown" component. So it would be useful if we could check that the protein numbers in AmiGO agreed with what the respective MODS believe to be their proteomes. The difference is something to do with transposons and ORF which aren't considered to be real. We could discuss on an editors call first , or you can close for now and I'll open a new ticket with exact examples next time I encounter the issue.

ValWood commented 11 months ago

It can probably close. It arose because I was doing a comparison with S. cerevisiae and the gene numbers didn't add up because transposable elements were included.

If I look in Amigo for the number of S. cerevisiae proteins I get 6045, but SGD reports only 5930 (the correct number, i.e. 5258+672 https://www.yeastgenome.org/genomesnapshot The difference of >100 may not seem many but it is important if you are trying to identify the "unknown" component. So it would be useful if we could check that the protein numbers in AmiGO agreed with what the respective MODS believe to be their proteomes. The difference is something to do with transposons and ORF which aren't considered to be real. We could discuss on an editor's call first, or you can close for now and I'll open a new ticket with exact examples next time I encounter the issue.

suzialeksander commented 4 months ago

@ValWood closing, go ahead with the new ticket when there's an example