cidgoh / nf-ncov-voc

A Nextflow wrapped workflow for generating the mutation profiles of SARS-CoV-2 genomes (Variants of Concern and Variants of Interest). Workflow is developed in collaboration with COVID-MVP (https://github.com/cidgoh/COVID-MVP) which can be used to visualize the mutation profiles and functional annotations.
MIT License
5 stars 5 forks source link

mutation name discrepancy #151

Closed miseminger closed 7 months ago

miseminger commented 7 months ago

Insertion at pos 6500, ln 38 in /home/miseminger/projects/def-virusmvp/shared_data/latest_gvfs/BF.11_annotated.gvf:

nt_name=g.6237_6238ig.TCTTCA;aa_name=p.P2079_A2080insSS

nt name format is unknown, @miseminger will keep looking into it.

miseminger commented 7 months ago

noncomprehensive list of delins amino acid names that aren't delins at nt level (might be fine, but to look into):

/home/miseminger/projects/def-virusmvp/shared_data/latest_gvfs/BA.4_annotated.gvf nt_name=g.CATGGTCATGTTA241_253AATG;aa_name=p.H81_M85delinsNV

/home/miseminger/projects/def-virusmvp/shared_data/latest_gvfs/BA.5_annotated.gvf nt_name=g.173_181delAAGGCGTTT;aa_name=p.K58_L61delinsM

/home/miseminger/projects/def-virusmvp/shared_data/latest_gvfs/BA.5.1.2_annotated.gvf nt_name=g.50_52delGTT;aa_name=p.S17_L18delinsM

/home/miseminger/projects/def-virusmvp/shared_data/latest_gvfs/BA.1.1.1_annotated.gvf nt_name=g.245_253delGTCATGTTA;aa_name=p.G82_M85delinsV

miseminger commented 7 months ago

This was a regex issue: df["hgvs_nucleotide"] = df["hgvs_nucleotide"].str.replace("n.", "g.", regex=True) changed "ins" to "ig.". Solved by changing to regex=False.

miseminger commented 7 months ago

Closed with commit (611e3b23df56bf537fc1ca7023f99695a593f38d)[https://github.com/cidgoh/nf-ncov-voc/pull/149/commits/611e3b23df56bf537fc1ca7023f99695a593f38d]