freeseek / gtc2vcf

Tools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
MIT License
143 stars 24 forks source link

Couple SNPs on ChrY and ChrM shifted by -1. #75

Closed rajwanir closed 3 months ago

rajwanir commented 3 months ago

Hi @freeseek

Using the SourceSeq mapping workflow, I note that a couple SNPs gets shifted by -1 (relative to dbSNP and orignal Illumina csv_manifest). Any guess on why that might be? Would that be another dbSNP and genome build inconsistency as explained in #74

Here is a list of example SNPs shifted by -1 in gtc2vcf SourceSeq mapping workflow:

ID CHROM gtc2vcf-position dbSNP-position SourceSeq

rs367866237 chrY 21089995 21089994 CAAAATTGTTGGAATTGTGAGCTGGCATGCACTGGACCATTATCAGCTTA[A/C]TTTTTGTGGGCCACCCCAAAAACGCAATAGTTAGAAGAAGATGTTTAATG rs370944521 chrY 12985132 12985131 GAGCCTATCACAGAGTTTGTGTCTCTGCTGGAAATAGAGACGTTAATCAC[T/C]GCCAAGCATGGTGCTCTTCGGCTAGCACAGAGCAGCTCTCAGGCACTGAA rs369647419 chrM 10275 10274 TTCTTAGTAGCTATTACCTTCTTATTATTTGATCTAGAAATTGCCCTCCT[T/C]TACCCCTACCATGAGCCCTACAAACAACTAACCTGCCACTAATAGTTATG rs370821352 chrM 3589 3588 GCTCTCACCATCGCTCTTCTACTATGAACCCCCCTCCCCATACCCAACCC[T/C]TGGTCAACCTCAACCTAGGCCTCCTATTTATTCTAGCCACCTCTAGCCTA

The bwa alignment for rs367866237 looks like this:

rs367866237-138_T_F_2319595219:1 0 chrY 21089944 60 49M1D52M 0 0 CAAAATTGTTGGAATTGTGAGCTGGCATGCACTGGACCATTATCAGCTTAATTTTTGTGGGCCACCCCAAAAACGCAATAGTTAGAAGAAGATGTTTAATG NM:i:1 MD:Z:49^A52 AS:i:94 XS:i:34 rs367866237-138_T_F_2319595219:2 0 chrY 21089944 60 49M1D52M 0 0 CAAAATTGTTGGAATTGTGAGCTGGCATGCACTGGACCATTATCAGCTTACTTTTTGTGGGCCACCCCAAAAACGCAATAGTTAGAAGAAGATGTTTAATG NM:i:2 MD:Z:49^A1A50 AS:i:89 XS:i:23

Thank you.

freeseek commented 3 months ago

If you look at the flanking sequences for rs367866237, rs370944521, rs369647419, and rs370821352 on dbSNP you get:

rajwanir commented 3 months ago

Thank you @freeseek . It is quite strange observation that exactly one nucleotide adjacent to the SNP is missing in these chrY and chrM SourceSeq. Certainly, no way it can be fixed on the analysis side.

Thank you for looking into it.