Illumina / strelka

Strelka2 germline and somatic small variant caller
GNU General Public License v3.0
353 stars 102 forks source link

Performance issues on high depth amplicon data #63

Open jaudoux opened 6 years ago

jaudoux commented 6 years ago

Hi,

I am trying to run strelka2 in somatic mode on high depth data (~3570.95X) on a small gene panel (trusight) but the job never ends. I killed the job after 15h.

This is the command line that I used to generate the workflow :

configureStrelkaSomaticWorkflow.py --normalBam normal.bam --tumorBam tumoral.bam --ref GRCh37.fa --exome --callRegions manifest.bed.gz --runDir strelka_default

Is there something that I have missed in the documentation that could help me to increase the performances ?

Best, Jérôme.

PS : I successfully ran strelka2 with exome data.

ctsa commented 6 years ago

Sorry for the delay getting back to you. This should not normally be a problem unless the high depth regions also have extreme alignment noise -- some of the GRCh38 decoys have this problem, but we would not expect it for any targeted region. Can this data be shared? If not, would you be able to send us the logs?

JingzhX commented 5 years ago

Hi @ctsa I faced the problem as well when analyzing high-depth (3k-10k x) amplicon sequencing data (IonCode). The somatic pipeline v2.9.10 was run successfully on thousands of WGS/WES data sets. Is there any solution to this problem now?

The patient-matched normal and tumor bam files: https://drive.google.com/open?id=1mxc5D9QtT9xhNSqY6YFDxUPBz9sg0ZXi

Thanks.

Best,

Jim