smoove calls huge number of calls

Manuelaio commented 1 year ago

Hi

I developed a pipeline for Short Read WGS data and I included smoove calls using the follow command:

smoove call --name $sample_id --exclude $excluded_regions_smoove --fasta $reference_file -p 8 $alignment_file

Running the pipeline in a cohort of samples I have some weird results, basically smoove calls a huge number of variants compared to other tools in some samples: for examples it calls 134237 variants of which 130095 have SU< 8 and in the same sample, other tools calls 6623/5900 variants. Since each smoove-variants have a FILTER equals to "." so I was wondering if I could use SU value for filtering out low quality variants or if I can use some other metrics in vcf file.

Thanks

brentp commented 1 year ago

hi, I would run with --duphold and --genotype and filter on those fields to get fewer numbers.

Manuelaio commented 1 year ago

Thank you ! It works

brentp / smoove

smoove calls huge number of calls #208