Kennedy-Lab-UW / Duplex-Seq-Pipeline

A standalone end-to-end data analysis pipeline for Duplex Sequencing
Other
21 stars 9 forks source link

Create option for bamToCountmuts.py to output vcf of variants used to calculate frequencies #10

Open scottrk opened 4 years ago

scottrk commented 4 years ago

We already identify these variant anyway, so it would be helpful to have the option of creating a filter vcf file with just the variants used for calculating frequencies for further analysis such as dN/dS, clonality, pathogenicity, etc.

scottrk commented 4 years ago

Alternative strategy would be to create a "filterVCF.py" script that did the filtering based on specific criteria.

bkohrn commented 4 years ago

My preferred strategy would be to use the tool "bcftools view" (we already use bcftools to filter variants to the bed file; see snippet below) and invoke the --apply-filters option (http://samtools.github.io/bcftools/bcftools.html#view).

https://github.com/KennedyLabUW/Duplex-Seq-Pipeline/blob/b7f7efc7bd876bce52e0bd592fde20427d8f63c0/Snakefile#L1218-L1239

scottrk commented 4 years ago

I don't believe that solved the entire issue unless bcftools supports filtering of variants to bed blocks. Looking at the man page for bcftools, the -R option doesn't seem to support blocks in the bed file and the -r requires the use of a comma separated list in chr:from-to format on the CLI which kinda defeats the purpose of the bed blocks we're internally and makes the use on a large number of samples pretty unwieldy.

bkohrn commented 4 years ago

Just to clarify what is desired here: this is a desire to produce a VCF file that contains the variants contained in the blocks (as opposed to all variants in the sample or the whole bed file)? I'm not entirely sure what is being asked for here.

bkohrn commented 4 years ago

On reflection, this feature is not necessary at the present time. Removing from 2.0.0 development.