bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
992 stars 354 forks source link

Callable regions step running very slowly #3619

Open DrMcStrange opened 2 years ago

DrMcStrange commented 2 years ago

Hi,

I'm currently trying to call variants on a set of candidate genes (specified via variant_regions) for 63 WGS samples, and the 'callable regions' step is taking far too long.

Looking at the times in bcbio-nextgen-commands.log (attached), the main culprit seems to be picard CollectSequencingArtifactMetrics. The command doesn't use the --INTERVALS option, so presumably it's running for the whole genome rather the specified BED file. It also appears to be using only one core, so it's only running for one sample at a time on a 28-core node.

Is there any way we can get this running faster?

bcbio-nextgen-commands.log bcbio-nextgen-debug.log

naumenko-sa commented 2 years ago

HI @DrMcStrange !

Yes, this is too slow.

Could you please use

tools_off: collectsequencingartifacts

in the yaml? SN

DrMcStrange commented 2 years ago

Thanks @naumenko-sa !

The run finally completed, but I'll try that just to confirm.