bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

Variant calling time #1643

Closed pengxiao78 closed 7 years ago

pengxiao78 commented 7 years ago

Hi Brad, I am using four vairant callers: mutect2, freebayes, vardict, varscan and ensemble: numpass: 2 criteria to run the turmor-germline cancer variant calling pipeline for 17 pairs in the new 1.0.0 development version with GATK 3.6 and the most updated dbSNP. I found that the variant calling running time is much longer than previous version, GATK 3.4, mutect1.7, and old dbSNP. Previously, all the running to from the beginning to the final results will be within 4 days. So I just set up 4-day time limit in the slurm file. However, the 4-day time limit has expired just now. Then I re-submit the slurm file but found that the ProgressMeter showed 4-7 days remaining runtime for various locations in variant calling. It seems that the bcbio re-ran the variant calling from the beginning but not from the points that previously had completed. Do you have any solution to let the pipeline start from the points that previously have finished to save time in variant calling stage. Otherwise, it seems that it will take tremendous time if the pipeline breaks during the variant calling stage. Thanks.

pengxiao78 commented 7 years ago

Hi Brad, Is there way to solve this problem? Thanks.

chapmanb commented 7 years ago

Sorry about the issues. bcbio splits variant calling into genomic regions and on restart will not redo any region/caller combinations that finished. However, if you have a long running process that did not finish cleanly, bcbio will not be able to restart that and will have to start that section and caller over again.

It sounds like in this case you're running into long run times for MuTect2. My experience is that it has unpredictably variable run times, and some inputs will cause it to run extremely slowly as you're seeing. I don't have a good solution for this, it's really a Mutect2 limitation.

Hope this helps explain the situation.

pengxiao78 commented 7 years ago

Brad, I see. Thank you for your explanation. If this is the case, is there any way that I can downgrade GATK into 3.6 and MuTect 1.17 that I used previously? Thanks.

chapmanb commented 7 years ago

There is no need to downgrade, you can just specify mutect instead of mutect2 in your configuration file. They are two separate callers. You will need to have Java 1.7 available in your PATH (bcbio only ships with Java 1.8) but otherwise everything should work cleanly. Hope this helps.

pengxiao78 commented 7 years ago

I have added Java 1.7 into my PATH and found this Error. Should I copy the jar muTect 1.1.7 into the directory as follows?

ValueError: Could not find jar muTect in /path/to/parallel/bcbio/anaconda/bin/picard:/path/to/parallel/bcbio/anaconda/bin/picard

But /path/to/parallel/bcbio/anaconda/bin/picard is a file but not a directory.

Or should I use

bcbio_nextgen.py upgrade --tools --toolplus mutect=/path/to/mutect/mutect-1.1.7.jar

to install mutect 1.1.7?

Thanks,

pengxiao78 commented 7 years ago

Brad, Could you answer my above question? Thanks!

chapmanb commented 7 years ago

We're happy to try and help but please don't ask in the same post multiple times within a short time window. We try to answer as fast as we can within our other commitments.

The install documentation describes how to add mutect. We're not allowed to distribute it so it needs a manual download and install (http://bcbio-nextgen.readthedocs.io/en/latest/contents/installation.html#gatk-and-mutect-mutect2):

bcbio_nextgen.py upgrade --tools --toolplus mutect=/path/to/mutect/mutect-1.1.7.jar

Hope this helps.

pengxiao78 commented 7 years ago

Thanks you, Brad!