Open Sam-Sims opened 1 year ago
Looking into this a bit further it appears that in this sample amplicon 72 has dropped - which is in pool 2.
Looking at the *.2.vcf
file generated from medaka variant ./primer-schemes/scov2/V4.1/scov2.reference.fasta samplename.2.hdf samplename.2.vcf
it contains 2 variants from the dropped amplicon 72 that overlap with the deletion found in *.1.vcf
:
MN908947.3 21634 . ACCCC GTGG 9.798 PASS . GT:GQ 1:10
MN908947.3 21640 . TG GA 6.765 PASS . GT:GQ 1:7
These variants are not found in the *.1.vcf
so I assume are just artifacts coming from the dropped amplicon.
Interestingly when running with the --no-longshot
flag and instead use medaka tools annotate
you can see the depth of each variant is 1
in the *.2.vcf
which is to be expected since they came from the dropped amplicon. They are then filtered out correctly and minion
completes with no errors.
MN908947.3 21634 . ACCCC GTGG 9.798 PASS DP=1.0;DPS=1.0,0.0 GT:GQ 1:10
MN908947.3 21640 . TG GA 6.765 PASS DP=1.0;DPS=1.0,0.0 GT:GQ 1:7
However if longshot is used it seems like the variant at 21640 is left in and so the merged.vcf
contains the following entry with a depth of 43 .
MN908947.3 21640 . TG GA 65 PASS DP=43;AC=0,2;AM=41;MC=0;MF=0.000;MB=0.000;AQ=3.65;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:DP:PS:UG:UQ 0/1:32:43:.:0/1:31.97
This is written to the fail.vcf still - I am guessing because of the quality score (?) and then causes the issue detailed above.
Perhaps this is a bug with longshot
when calculating the depth of the variant as I am unsure where the depth of 43 has come from - nothing in the BAM indicates why this would be.
Running minion
with --strict
works too, but the deletion is omitted from the consensus.
Admittedly this does seem to be an edge case resulting from a dropped amplicon - which isnt ideal in the first place.
Hello,
I am getting the above error in ARTIC v1.2.4 in bcftools consensus due to overlapping 'pass' and 'fail' variants.
Running: bcftools consensus -f sample_name.preconsensus.fasta sample_name.pass.vcf.gz -m sample_name.coverage_mask.txt -o sample_name.consensus.fasta
The fasta sequence does not match the REF allele at MN908947.3:21632:
REF .vcf: [TTACCCCCTG]
ALT .vcf: [T]
REF .fa : [NNNNNNNNNN]CATAC
This is caused by an 11bp deletion in fail.vcf overlapping with a 10bp deletion in pass.vcf. The 11bp deletion is added to the mask and thus the preconsensus contains Ns which overlap with the variant which passes.
fail.vcf:
MN908947.3 21631 . ATTACCCCCTG A 309.485 PASS DP=13.0;DPS=5.0,8.0;Pool=2 GT:GQ 1:309
pass.vcf:
MN908947.3 21632 . TTACCCCCTG T 289.132 PASS DP=45.0;DPS=13.0,32.0;Pool=1 GT:GQ 1:289
This is a very similar issue to #21 which has been fixed . The difference is that in this case the variants do not have exactly the same starting position, they simply overlap. A solution would be to ignore failed variants which overlap with a variant which passes.
Hello,
I think there are edge cases that result in overlapping variants in both the pass and fail vcf files causing bcftools to complain and
artic minion
to fail:My
fail.vcf
contains a variant at 21640:MN908947.3 21640 . TG GA 65 PASS DP=43;AC=0,2;AM=41;MC=0;MF=0.0;MB=0.0;AQ=3.65;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:DP:PS:UG:UQ 0/1:32:43:.:0/1:31.97
Meanwhile the
pass.vcf
contains a 9nt deletion at 21632, overlapping 21640:MN908947.3 21632 . TTACCCCCTG T 500 PASS DP=43;AC=0,43;AM=0;MC=0;MF=0.0;MB=0.0;AQ=28.59;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:DP:PS:UG:UQ 1/1:137:43:.:1/1:136.04
Due to the variant being present in the fail.vcf this is maksed in the preconsensus when
artic_mask
is run and soTG
becomesNN
.When bcftools is run it expects to find
TTACCCCCTG
since that is the variant in the pass.vcf file - but instead is finding the maskedTTACCCCCNN
in the preconsensus and so complains:As a temp workaround I have masked the variant in the pass.vcf and re-run bcftools manually to generate a consensus sequence. Do you have any reccomendations for these edge cases?