artic-network / fieldbioinformatics

The ARTIC field bioinformatics pipeline
MIT License
110 stars 69 forks source link

How do deal with overlapping mutations in pass.vcf and fail.vcf? #90

Open rpetit3 opened 2 years ago

rpetit3 commented 2 years ago

Hello!

This is somewhat related to https://github.com/artic-network/fieldbioinformatics/issues/53

In my fail.vcf I have:

MN908947.3  694 .   T   A   54.15   PASS    DP=2813;AC=40,25;AM=2748;MC=0;MF=0.0;MB=0.0;AQ=4.55;GM=1;PH=6.02,6.02,6.02,6.02;SC=None;    GT:GQ:PS:UG:UQ  0/1:54.15:.:0/1:54.15

And the pass.vcf I have:

MN908947.3      685     .       AAAGTCATTT      A       500.0   PASS    DP=2812;AC=2,2810;AM=0;MC=0;MF=0.0;MB=0.0;AQ=35.89;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ  1/1:500.0:.:1/1:500.0
MN908947.3      691     .       AT      A       500.0   PASS    DP=2813;AC=0,2489;AM=324;MC=0;MF=0.0;MB=0.0;AQ=10.48;GM=1;PH=6.02,6.02,6.02,6.02;SC=None;       GT:GQ:PS:UG:UQ  1/1:500.0:.:1/1:500.0

When artic_mask is run, AAAGTCATTT becomes AAAGTCATTN in the preconsensus.fasta, and as a consequence bcftools complains:

Note: the --sample option not given, applying all records regardless of the genotype
The fasta sequence does not match the REF allele at MN908947.3:685:
   REF .vcf: [AAAGTCATTT]
   ALT .vcf: [A]
   REF .fa : [AAAGTCATTN]GACTTAG.....

What would be your recommended procedure in these very rare cases? Ignore the fail.vcf, mask the variant in the pass.vcf?