geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Suspicious sequence range for UniProtKB:P43251 derived EWASes #305

Closed nataled closed 10 months ago

nataled commented 10 months ago

The following were found while figuring out the best way to find potential signal/transit peptide EWASes. Each of these is a single amino acid variation of BTD (UniProtKB:P43251). All but one of these are cases where the variation position is outside the sequence range given for the EWAS (1-68), which makes me suspect that the range is incorrect. I can't find any reason why a range was given in the first place (UniProtKB indicates the signal peptide is 1-41 while the mature protein is 42-543) so my suspicion is hard to back up in terms of how/why it happened.

Reactome:R-HSA-3325564 === BTD G34S Reactome:R-HSA-3325545 === BTD R79C Reactome:R-HSA-3325576 === BTD A171T Reactome:R-HSA-4225080 === BTD A171T Reactome:R-HSA-3325552 === BTD Q456H Reactome:R-HSA-4225082 === BTD Q456H Reactome:R-HSA-3325543 === BTD R538C Reactome:R-HSA-4225078 === BTD R538C Reactome:R-HSA-3325548 === BTD A755G Reactome:R-HSA-4225077 === BTD A755G

deustp01 commented 10 months ago

The correct start and end coordinates for mature BTD protein are 42 and 543, respectively per UniProt P53421. Reactome EWAS coordinates 1-41, probably a curation error that was propagated by copy-pasting to create variant EWAS instances, have been corrected in Reactome.

Two of the variant EWASs are claimed to involve amino acid residues outside the 42-543 range; these have been deleted in Reactome. All of the other variant annotations have been confirmed by reference to annotations in UniProt or ClinVar or both, so all of these have been retained, now with correct (42-543) start and end coordinates.

Reactome:R-HSA-3325564 === BTD G34S seq OK in UniProt; var not in Uniprot; var not in ClinVar DELETE this EWAS Reactome:R-HSA-3325545 === BTD R79C seq OK in UniProt; var not in UniProt; var listed in ClinVar Keep this EWAS Reactome:R-HSA-3325576 === BTD A171T seq OK in UniProt; var listed UniProt; var not in ClinVar Keep this EWAS Reactome:R-HSA-4225080 === BTD A171T seq OK in UniProt; var listed UniProt; var not in ClinVar Keep this EWAS Reactome:R-HSA-3325552 === BTD Q456H seq OK in UniProt; var listed UniProt; var not in ClinVar Keep this EWAS Reactome:R-HSA-4225082 === BTD Q456H seq OK in UniProt; var listed UniProt; var not in ClinVar Keep this EWAS Reactome:R-HSA-3325543 === BTD R538C seq OK in UniProt; var listed UniProt; var not in ClinVar Keep this EWAS Reactome:R-HSA-4225078 === BTD R538C seq OK in UniProt; var listed UniProt; var not in ClinVar Keep this EWAS Reactome:R-HSA-3325548 === BTD A755G 755 beyond c-term of protein (543) DELETE this EWAS Reactome:R-HSA-4225077 === BTD A755G 755 beyond c-term of protein (543) DELETE this EWAS

ClinVar: search BTD at https://www.ncbi.nlm.nih.gov/clinvar, filtering for missense mutations known to be pathological. UniProt: scan list of variants at https://www.uniprot.org/uniprotkb/P43251/entry#disease_variants Check mutation descriptions against canonical BTD amino acid sequence at https://www.uniprot.org/uniprotkb/P43251/entry#sequences