geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
35 stars 10 forks source link

GO:0006487 is ambiguous #765

Closed gocentral closed 9 years ago

gocentral commented 14 years ago

The term GO:0006487 "protein amino acid N-linked glycosylation" is ambiguous, and is currently associated to genes which are related to it under different criteria.

To be precise, the term is associated indifferently to genes which are involved in the N-glycosylation pathway (the process in which an N-glycan is added and modified on translated protein), and genes which are N-glycosylated, but not belong to the N-glycosylation pathway. For example, ALG1 (http://www.uniprot.org/uniprot/Q9BT22) is an ER membrane enzyme that synthesize the N-glycan precursor (and it is not glycosylated) while ALK is an N-glycosylated cell membrane receptor, with no activity in the N-glycan synthesis or branching.

You should clarify this ambiguity... Either you create a separate term for "N-glycosylation pathway", "N-glycan synthesis and processing", or specify the ambiguity in the description.

I am curating the N-glycosylation pathway in reactome, therefore I have a list of the genes that you should include here or exclude. I have references for most of these reactions, if you need them.

Genes that participate to the synthesis and modification of the N-glycan sugar, and are alread in GO:0006487: ALG12 ALG2 ALG6 ALG8 B4GALT1 DAD1 DDOST DOLPP1 DPAGT1 FUT8 MAN1A2 MAN1B1 MAN1C1 MAN2A1 MGAT1 MGAT2 MGAT3 MGAT4A MGAT4B MGAT5 MOGS MPDU1 OSTC RPN1 RPN2 STT3A STT3B TUSC3 GAL3ST1 MAGT1 ST8SIA2 ST8SIA3 ST8SIA4

Genes that participate to the synthesis and modification of the N-glycan sugar, but are not yet in GO:0006487: DPAGT1 ALG1 ALG11 RFT1 ALG3 ALG9 ALG10A/B ALG5 DPM1 DPM2 DPM3 PMM1 PMM2 GMPPA GMPPB PGM3 UAP1 DOLK1 GCS1 GANAB1 UGGG1

Genes of proteins which are N-glycosylated, but not participate in N-glycosylation of other proteins: ALK (tyrosine kinase receptor) CD37 (Leukocite antigen 37) CD4 (Leukocite antigen 4) GYPC (sialoglycoprotein in human erythrocyte membranes) KEL (Zinc endopeptidase with endothelin-3-converting enzyme activity) LIPA (Lysosomal acid lipase/cholesteryl ester hydrolase) TM4SF4 (Intestine and liver tetraspan membrane protein) TM4SF5 (Transmembrane 4 L6 family member 5) TSPAN7 (Cell surface glycoprotein A15)

errors: POMGNT1 (this genes seems to be related to O-glycosylation rather than N-glycosylation)

Reported by: dalloliogm

Original Ticket: geneontology/annotation-issues/765

gocentral commented 14 years ago

GO:0006487 refers unambiguously to the process of adding glycosyl groups to proteins via N atoms. It is understood by GO Consortium annotators that an annotation to a biological process term implicitly makes a statement that the gene product participates in the process represented by the GO term, and that annotating gene products that are targets of a process (rather than participants) is erroneous.

This item thus identifies some annotation errors, so I am moving it to the Annotation Issues tracker.

Original comment by: mah11

gocentral commented 14 years ago

Original comment by: mah11

gocentral commented 14 years ago

On the assumption that these are all human proteins, I'm assigning this to the Uniprot GOA team.

Original comment by: mah11

gocentral commented 14 years ago

Original comment by: mah11

gocentral commented 14 years ago

When this item first appeared I noticed that at least some of the offending annotations appeared to be IEAs from Ensembl, but that the IDs they referred to did not exist..... So one question would be whether IEAs annotations based on non existant Ensembl entries should be removed?

This set of genes would also be a good pathway for reference genome to annotate as they are highly conserved single copy in most organisms and clearly have a bunch of old annotation errors which are resulting in bad automated mappings....

Val

Original comment by: ValWood

gocentral commented 14 years ago

Also thanks to Reactome for supplying the ID list. I will use this to cross check the fission yeast annotations

Original comment by: ValWood

gocentral commented 14 years ago

Hi,

Thanks again. I've removed/updated offending annotations or contacted the annotation groups responsible (at least where the human proteins are concerned). Most manual annotations were from Proteome Inc. whose last public dataset was integrated into GOA in 2001. InterPro has been contacted regarding one of their mappings. And Ensembl-originating annotations will be corrected when they next update their projections. Val - the Ensembl ids seem fine (at least the couple I've just tested), we integrate from Ensembl each time they release annotations. So we would only be one version of Ensembl behind at most. I agree improving the annotation set for this process would be a nice RefGen annotation project - I'll alert Pascale. Emily

Original comment by: edimmer

gocentral commented 14 years ago

Original comment by: edimmer

gocentral commented 14 years ago

Hi,

I have added this as a ref genome annotation project suggestion: http://gocwiki.geneontology.org/index.php/Protein\_amino\_acid\_N-linked\_glycosylation

with Val as the point person (who else is interested?)

I was wondering whether those gene names referred to human, or which species?

Thanks, Pascale

Original comment by: pgaudet

gocentral commented 14 years ago

Hi Pascale, Thanks - most of these symbols refer to human genes. Cheers, Emily

Original comment by: edimmer