AstraZeneca-NGS / VarDict

VarDict
MIT License
187 stars 61 forks source link

Option to disable 3' shift (SHIFT3) for complex variants #134

Open andurill opened 4 years ago

andurill commented 4 years ago

Hi,

There are two parts to my question: 1.) Unlike indels, complex variants reported by vardict do not have anchor bases, even if the complex variants include indels. For some of these variants, depending on the genomic sequence, the Position, Ref, and Alt are shifted to the 3', which reflects the net effect of the mutation. However, this format confounds most genotyping tools since the variants are not reflected in the bam files with the 3' shift. I have attached a screenshot of a complex variant. Its reported by vardict as:

CHROM POS ID REF ALT

7 55242469 . TTAAGAGAAG C

But genotyping tools can only detect this variant with either of the following annotations:

CHROM POS ID REF ALT

7 55242466 . GAATTAAGAGAAG GAAC 7 55242465 . GGAATTAAGAGAAG GGAAC (with an additional anchor base)

Using SHIFT3 and LSEQ values in vardict vcf, I can pad the ref and alt to create a version of vcf that is compatible with genotyping tools. But it would be more convenient if there is an optional argument in vardict that allows reporting complex variants but without the 3' shift for annotations. Is this a reasonable request? Or is this hard to implement?

2.) If part 1 is feasible, is it also possible to report these complex variants with an anchor base. It just makes the vcf consistent and more compatible with other downstream bioinformatics tools such as vcf2maf.

Thank you very much!

complex_v2
PolinaBevad commented 4 years ago

@andurill, hello! Thank you for using VarDict!

Can you please share the part of BAM file where such complex variants appear?

I think this is not so difficult to implement as an option, but if you can provide some test cases, it will be even easier. Thank you!

andurill commented 4 years ago

@PolinaBevad , I can provide bam files for a few test cases. But I just noticed that I created this ticket here instead of under VardictJava. Should I create this ticket there, before I include the test cases?

PolinaBevad commented 4 years ago

Hi @andurill, you can add BAM files here. In last year we strive to merge changes in both versions at the same time, so it doesn't matter where you create the issue. Thank you!

andurill commented 3 years ago

Hi @PolinaBevad , apologies for not following up on this earlier. I couldn't share the test bam at the time due to the nature of the project. However, recently I've encountered more variants of that nature and noticed that even one of the later versions (VarDict-1.8.2) does not have the option to disable the 3' shift of the REF/ALT for complex variants.

I have attached a zip file that includes the test bam, vardict-1.8.2 output vcf, and a IGV screenshot of the variant. The first variant in the vcf is annotated by Vardict as 7 55242469 . TTAAGAGAAG C with a SHIFT3=3. But typical genotyping tools that rely on CIGAR strings cannot genotype this variant due to the 3' shift. If the variant was instead annotated as 7 55242466 . GAATTAAGAGAAG GAAC, then genotyping will work.

My question, as stated in the comments above, is whether you can introduce an optional argument to disable the 3' shift (alternative alignment) for complex variants.

Vardict command I used: ./VarDict-1.8.2/bin/VarDict -th 10 -G ./Homo_sapiens_assembly19.fasta -N TUMOR -b "./test.bam" -Q 20 -q 20 -f 0.0001 -C -z 1 -c 1 -S 2 -E 3 -x 2000 -X 5 chr7.bed | ./VarDict-1.8.2/VarDict/teststrandbias.R | ./VarDict-1.8.2/VarDict/var2vcf_valid.pl -A > chr7.vcf

Thanks a lot in advance! test_case.zip

andurill commented 3 years ago

@PolinaBevad also wondering if its possible to add anchor bases to complex variants that include indels. So, for example, 7 55242466 . GAATTAAGAGAAG GAAC with anchor base will be 7 552424665 . GGAATTAAGAGAAG GGAAC. I'm not sure if there is a underlying reason for not adding anchor bases to complex variants (with indels) in the vardict output when its currently being for indels already.

andurill commented 3 years ago

hi @PolinaBevad just wanted to follow up on the issue above and check whether this is something you/your team will be interested in addressing in the near future. Thank you.