CSB5 / lofreq

LoFreq Star: Sensitive variant calling from sequencing data
http://csb5.github.io/lofreq/
Other
100 stars 31 forks source link

Variants missing in parallel mode #123

Open oleraj opened 2 years ago

oleraj commented 2 years ago

Hi,

We're looking for somatic/mosaic variants in exome data and there's a particular variant of interest that is not showing up when I run lofreq in parallel mode, however it does show up when I used lofreq in single-threaded mode. Single-threaded mode takes 10x longer to run so I'd like to use parallelization but I'm uncertain why this variant is omitted during the merging step.

My settings are

lofreq call-parallel --pp-threads 16 --ref human_g1k_v37_decoy_plus.fasta --bed Targets_BED.bed -s --call-indels -S gnomad.exomes.r2.0.1.sites.vt.fix.AF001.vcf.gz -o ${prefix}.lofreq.vcf indelqual/${p}.bam

Interestingly, when I check the intermediate files during the run, I do see the variant:

$ tabix /hpcdata/scratch/15010036.1.all.q/lofreq2_call_paralleljumki42g/21.vcf.gz 4:55594084-55594084
4   55594084    .   A   G   92  PASS    DP=40;AF=0.150000;SB=0;DP4=25,9,5,1 

But in the final filtered file the variant is absent.

Also, if I run the analysis in single-threaded mode for the whole exome, or manually for just for this region of interest, the variant shows up in the results, e.g.,

lofreq call --ref $ref -s -r 4:55594084-55594084 --call-indels -S gnomad.exomes.r2.0.1.sites.vt.fix.AF001.vcf.gz $bam

It seems this is not the only variant. I ran the same BAM file with different threads and it seems the more threads I use the fewer the variants remaining in the VCF file:

1 thread – 103737 variants 4 threads – 103601 16 threads – 103601 30 threads – 103597

I tried this first with v.2.1.3.1 but also tried with v.2.1.5 and I see the same thing.

Any idea why some variants would be missing in parallel mode that are present in single-threaded mode?

Thanks!

Andrew

reesea22 commented 2 years ago

Hi,

We were having the same issue. In order to get variants using call-parallel we first had to index the .bam input that we used using samtools index, i.e.,

samtools index <bam file>
lofreq call-parallel --pp-threads <threads> --call-indels -f <reference fasta> <bam file> -o <vcf file>

Hope this helps, Amy

oleraj commented 2 years ago

Hi @reesea22 thanks for the comment. Unfortunately the BAM files are already indexed so I don't think that will solve my issue.