Hi - I have 2 SNP sites in a Sars Cov2 BAM that have become N in the resulting consensus sequence and I was hoping you could help me understand why. I am using V3 primer set. After reading Quick J et al. 2017 and these docs https://artic.readthedocs.io/en/latest/minion/ . My understanding of this paragraph - 'By softmasking, we refer to the process of adjusting the CIGAR of each alignment segment such that soft clips replace any reference or query consuming operations in regions of an alignment that fall outside of primer boundaries. The leftmost mapping position of alignments are also updated during softmasking.' - is that if the sequence in a read is meant to be excluded from contributing it will be soft clipped. This fits with what I see in BAMs in IGV. The following two SNP sites do not fit with my understanding of how this is meant to work:
Position 28111 which is found between - but not contained within - V3 primers sites 93_LEFT and 92_RIGHT. 93_LEFT ends at 28105 so it is 6 bp away.
VCFs:
grep 28111 *vcf
2270.fail.vcf:MN908947.3 28111 . A G 30.82 PASS DP=4;AC=0,4;AM=0;MC=0;MF=0.0;MB=0.0;AQ=16.84;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ 1/1:9.19:.:1/1:9.19
2270.merged.vcf:MN908947.3 28111 . A G 30.82 PASS DP=4;AC=0,4;AM=0;MC=0;MF=0.000;MB=0.000;AQ=16.84;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ 1/1:9.19:.:1/1:9.19
2270.nCoV-2019_1.vcf:MN908947.3 28111 . A G 43.555 PASS GT:GQ 1:44
2270.nCoV-2019_2.vcf:MN908947.3 28111 . A G 17.292 PASS GT:GQ 1:17
RG1 is pink (ish) and all 84 reads look to have been filtered out. Not sure why. RG2 (blue) has 4 reads (only 2 show above as only 2 are from that primer site) and SNP A>G at 28111 is in fail VCF due to having DP 4.
The second SNP site i'm interested in is 1596
VCFs:
grep 1596 *vcf
2270.fail.vcf:MN908947.3 1596 . A G 275.91 PASS DP=19;AC=0,18;AM=1;MC=0;MF=0.0;MB=0.0;AQ=13.78;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ 1/1:48.9:.:1/1:48.9
2270.merged.vcf:MN908947.3 1596 . A G 275.91 PASS DP=19;AC=0,18;AM=1;MC=0;MF=0.000;MB=0.000;AQ=13.78;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ 1/1:48.90:.:1/1:48.90
2270.nCoV-2019_1.vcf:MN908947.3 1596 . A G 44.158 PASS GT:GQ 1:44
2270.nCoV-2019_2.vcf:MN908947.3 1596 . A G 36.182 PASS GT:GQ 1:36
This one is at the edge of the soft clipping. All 31 RG2 (blue) reads are filtered out by something leaving 19 RG1 (pink) and thus and N in the consensus sequence.
Is this behavior intended? Can you please tell me why one RG is being filtered out in this region of overlap between primer sites?
Hi - I have 2 SNP sites in a Sars Cov2 BAM that have become N in the resulting consensus sequence and I was hoping you could help me understand why. I am using V3 primer set. After reading Quick J et al. 2017 and these docs https://artic.readthedocs.io/en/latest/minion/ . My understanding of this paragraph - 'By softmasking, we refer to the process of adjusting the CIGAR of each alignment segment such that soft clips replace any reference or query consuming operations in regions of an alignment that fall outside of primer boundaries. The leftmost mapping position of alignments are also updated during softmasking.' - is that if the sequence in a read is meant to be excluded from contributing it will be soft clipped. This fits with what I see in BAMs in IGV. The following two SNP sites do not fit with my understanding of how this is meant to work:
Position 28111 which is found between - but not contained within - V3 primers sites 93_LEFT and 92_RIGHT. 93_LEFT ends at 28105 so it is 6 bp away.
VCFs:
depths
RG1 is pink (ish) and all 84 reads look to have been filtered out. Not sure why. RG2 (blue) has 4 reads (only 2 show above as only 2 are from that primer site) and SNP A>G at 28111 is in fail VCF due to having DP 4.
The second SNP site i'm interested in is 1596
VCFs:
depths
This one is at the edge of the soft clipping. All 31 RG2 (blue) reads are filtered out by something leaving 19 RG1 (pink) and thus and N in the consensus sequence.
Is this behavior intended? Can you please tell me why one RG is being filtered out in this region of overlap between primer sites?
Kind Regards, Liam