bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

indelcaller error #603

Closed ghost closed 9 years ago

ghost commented 10 years ago

Hi,

I am trying to add indelcaller to my yaml file, but I got the error message,

Traceback (most recent call last): File "/bcbio/bin/bcbio_nextgen.py", line 216, in [2014-09-22 19:00] Checking sample YAML configuration: /home/WGSData/bcbio_sample.yaml main(kwargs) File "/bcbio/bin/bcbio_nextgen.py", line 42, in main run_main(kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 35, in run_main fc_dir, run_info_yaml) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 73, in _run_toplevel samples = run_info.organize(dirs, config, run_info_yaml) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 34, in organize run_details = _run_info_from_yaml(dirs["flowcell"], run_info_yaml, config) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 415, in _run_info_from_ya ml _check_sample_config(run_details, run_info_yaml) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 304, in _check_sample_con fig [_check_algorithm_keys(x) for x in items] File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 196, in _checkalgorithm keys % (problem_keys, url)) ValueError: Unexpected configuration keyword in 'algorithm' section: ['indelcaller'] See configuration documentation for supported options: https://bcbio-nextgen.readthedocs.org/en/latest/contents/configuration.html#algorithm-parameters

I want to run mutect only for SNP calling and other indel callers pind and sid for indel calling. I am not sure if I added indelcaller to wrong place in my yaml file. Please take a look and thanks a lot!

screen shot 2014-09-22 at 12 23 11 pm

chapmanb commented 10 years ago

Michael; What version of bcbio are you running? indelcaller will only be available in the latest release (0.8.2) or development version. You can upgrade with bcbio_nextgen.py upgrade -u development or bcbio_nextgen.py upgrade -u stable.

Also we only support a single indelcaller to pair with mutect, so the double specification will not work. If your pind specification is meant to use pindel, that functionality is so alpha it's only a pull request and not yet integrated. We're working on adding and validating it, so I wouldn't recommend using it in any kind of production calling yet.

Hope this helps.

ghost commented 10 years ago

Hi Brad,

Thank you so much! I am running bcbio v0.8.1 and have upgraded to the latest version. OK, I will try to run mutect + sid to call somatic variants. Hopefully the below yaml works.

screen shot 2014-09-22 at 1 28 33 pm

mjafin commented 10 years ago

Hi Michael, For SID to work, you will have to have Appistry MuTect installed via the toolplus option, otherwise it won't be used. The SID indels are pretty bad and completely unfiltered.

ghost commented 10 years ago

Hi Miika,

Thank you for your help! Do you have any suggestions on the indel caller for mutect? I was tried scalpel indel caller with mutect, but I found it took almost one day to finish just one chromosome. I tried freebayes and varscan, and they are pretty fast. Now I am trying vardict and it is fast so far. I have no idea which indel caller is suitable for mutect. Your suggestion would be greatly appreciated!

Thanks, Michael

mjafin commented 10 years ago

Hi Michael, We've seen some evidence that Mutect is really good as a SNP caller (low FP rate) and vardict is an excellent indel caller (it produces SNPs too but FP rates have been higher than with Mutect). So you could just get SNPs from Mutect and indels from vardict as a post-processing step. Or if you feel adventurous, update to the latest developmental version and use Pindel as an indel caller bunched with Mutect.

ghost commented 10 years ago

Hi Miika,

I am trying to run vardict and I got the below error. I reproduced this issue on two different machines.

[2014-09-23 16:53] Genotyping with VarDict: Inference [2014-09-23 16:53] samtools: writing to standard output failed: Broken pipe [2014-09-23 16:53] samtools: error closing standard output: -1 Traceback (most recent call last): File "/bcbio/bin/bcbio_nextgen.py", line 216, in main(kwargs) File "/bcbio/bin/bcbio_nextgen.py", line 42, in main run_main(kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 36, in run_main fc_dir, run_info_yaml) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 82, in _run_toplevel for xs in pipeline.run(config, config_file, parallel, dirs, pipeline_items): File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 195, in run samples = genotype.parallel_variantcall_region(samples, run_parallel) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/genotype.py", line 157, in parallel_variantcall_reg ion "vrn_file", ["region", "sam_ref", "config"])) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 32, in grouped_parallel_split_com bine final_output = parallel_fn(parallel_name, split_args) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 84, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 660, in call self.retrieve() File "/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 543, in retrieve raise exception_type(report) TypeError: init() takes at least 3 arguments (2 given)

Do you have any suggestions? Thanks!

Michael

mjafin commented 10 years ago

Hi Michael, Could you try running the process in single thread local mode and report the output (if you still get the error)?

ghost commented 10 years ago

Hi Miika,

The error is still there,

[2014-09-23 17:39] Timing: coverage [2014-09-23 17:39] Resource requests: gatk, picard, vardict; memory: 3.5, 3.5; cores: 1, 1, 1 [2014-09-23 17:39] Configuring 1 jobs to run, using 1 cores each with 3.8g of memory reserved for each job [2014-09-23 17:39] Timing: alignment post-processing [2014-09-23 17:39] multiprocessing: piped_bamprep [2014-09-23 17:39] Timing: variant calling [2014-09-23 17:39] multiprocessing: variantcall_sample [2014-09-23 17:39] Genotyping with VarDict: Inference [2014-09-23 17:39] samtools: writing to standard output failed: Broken pipe [2014-09-23 17:39] samtools: error closing standard output: -1 [2014-09-23 17:39] samtools: writing to standard output failed: Broken pipe [2014-09-23 17:39] samtools: error closing standard output: -1 [2014-09-23 17:39] samtools: writing to standard output failed: Broken pipe [2014-09-23 17:39] samtools: error closing standard output: -1 [2014-09-23 17:40] samtools: writing to standard output failed: Broken pipe [2014-09-23 17:40] samtools: error closing standard output: -1 [2014-09-23 17:40] samtools: writing to standard output failed: Broken pipe

mjafin commented 10 years ago

Hi Michael, You can ignore the samtools warnings with VarDict for as long as the run doesn't fail. They're coming from reading a line from samtools and not closing the stream properly. Nothing to worry about.

ghost commented 10 years ago

Hi Miika,

Thanks! Actually, bcbio cannot finish the analysis with vardict due to the samtools error. After a while, the samtools error will kill the process. Please see the below error message,

[2014-09-23 16:53] Genotyping with VarDict: Inference [2014-09-23 16:53] samtools: writing to standard output failed: Broken pipe [2014-09-23 16:53] samtools: error closing standard output: -1 Traceback (most recent call last): File "/bcbio/bin/bcbio_nextgen.py", line 216, in main(kwargs) File "/bcbio/bin/bcbio_nextgen.py", line 42, in main run_main(kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 36, in run_main fc_dir, run_info_yaml) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 82, in run_toplevel for xs in pipeline.run(config, config_file, parallel, dirs, pipeline_items): File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 195, in run samples = genotype.parallel_variantcall_region(samples, run_parallel) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/genotype.py", line 157, in parallel_variantcall_reg ion "vrn_file", ["region", "sam_ref", "config"])) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 32, in grouped_parallel_split_com bine final_output = parallel_fn(parallel_name, split_args) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 84, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 660, in _call self.retrieve() File "/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 543, in retrieve raise exception_type(report) TypeError: init() takes at least 3 arguments (2 given)

Besides, I run the bcbio on a single machine but multiple thread mode when I got the error message.

chapmanb commented 10 years ago

Michael; Sorry about the issues. Miika is pointing out that both of the messages your seeing don't indiciate the root cause of the failure. The samtools: error closing standard output: -1 message comes from VarDict but is a "normal" message and happens when you pipe samtools but only take the start of the output.

The TypeError: init() takes at least 3 arguments (2 given) is a generic messaging from multiprocessing when a run fails, so also doesn't help us with the root cause.

When you run on a single core (-n 1) what error message do you get on exit? That might help us isolate the cause of the problem. Thanks for the patience debugging.

ghost commented 10 years ago

Hi Brad,

OK, I see. When I run bcbio on a single core the error message was about samtools broken pipe error. But I did not let it finish and then killed it. So I will try to rerun it again tomorrow and keep you updated.

Besides, I am trying to run the bcbio development version with mutect + pindel. The run is finished but I did not see any results generated by pindel. I am not sure if my yaml file is not correct, please take a look the below yaml file. ...

If it is not correct please let me know. Thanks!

Michael

mjafin commented 10 years ago

Hi Micheal, I had also a failure with VarDict but it was related to an error in R. Can you check that when you start R prior to running bcbio it starts properly?

Pindel integration seems to still be at the pull request level so not yet in the latest developmental version: https://github.com/chapmanb/bcbio-nextgen/pull/602

ghost commented 10 years ago

Hi Miika,

Yes, R run very well on my machine.

Thanks, Guorong

ghost commented 10 years ago

Hi Brad,

When I run bcbio on a single node (-n 1) and I got the following error message,

[2014-09-24 22:55] Genotyping with VarDict: Inference [2014-09-24 22:55] samtools: writing to standard output failed: Broken pipe [2014-09-24 22:55] samtools: error closing standard output: -1 [2014-09-24 23:06] Use of uninitialized value $sample in concatenation (.) or string at /bcbio/bin/var2vcf_somatic.pl line 35. [2014-09-24 23:06] /bin/bash: line 1: 25667 Killed /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt /disk/WGSData/prealign/syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g 4 /mnt/disk/WGSData/vardict/1/syn3-1_5394707_7174953-raw-regions.bed [2014-09-24 23:06] Uncaught exception occurred Traceback (most recent call last): File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run _do_run(cmd, checks, log_stdout) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 121, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt/disk/WGSData/prealign /syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g 4 /mnt/disk/WGSData/va rdict/1/syn3-1_5394707_7174953-raw-regions.bed | testsomatic.R | var2vcf_somatic.pl -N "syn3-tumor|syn3-normal" -f 0.1 | /bcbio/bin/vcfstreamsort | bgzip -c > /mnt /disk/WGSData/vardict/1/tx/tmpLvDQct/syn3-1_5394707_7174953-raw.vcf.gz samtools: writing to standard output failed: Broken pipe samtools: error closing standard output: -1 Use of uninitialized value $sample in concatenation (.) or string at /bcbio/bin/var2vcf_somatic.pl line 35. /bin/bash: line 1: 25667 Killed /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt/disk/WGSData/preal ign/syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g 4 /mnt/disk/WGSData /vardict/1/syn3-1_5394707_7174953-raw-regions.bed 25668 Done | testsomatic.R 25669 Done | var2vcf_somatic.pl -N "syn3-tumor|syn3-normal" -f 0.1 25670 Done | /bcbio/bin/vcfstreamsort 25671 Done | bgzip -c > /mnt/disk/WGSData/vardict/1/tx/tmpLvDQct/syn3-1_5394707_7174953-raw.vcf.gz ' returned non-zero exit status 137 Traceback (most recent call last): File "/bcbio/bin/bcbio_nextgen.py", line 216, in main(kwargs) File "/bcbio/bin/bcbio_nextgen.py", line 42, in main run_main(kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 35, in run_main fc_dir, run_info_yaml) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 81, in _run_toplevel for xs in pipeline.run(config, config_file, parallel, dirs, pipeline_items): File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 194, in run samples = genotype.parallel_variantcall_region(samples, run_parallel) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/genotype.py", line 155, in parallel_variantcall_region "vrn_file", ["region", "sam_ref", "config"])) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 32, in grouped_parallel_split_combine final_output = parallel_fn(parallel_name, split_args) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 84, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 653, in call self.dispatch(function, args, kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 400, in dispatch job = ImmediateApply(func, args, kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 138, in init self.results = func(_args, _kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 63, in wrapper return apply(f, _args, _kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 83, in variantcall_sample return genotype.variantcall_sample(_args) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/genotype.py", line 206, in variantcall_sample call_file = caller_fn(align_bams, items, sam_ref, assoc_files, region, call_file) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/vardict.py", line 42, in run_vardict assoc_files, region, out_file) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/vardict.py", line 136, in _run_vardictpaired do.run(cmd.format(*locals()), "Genotyping with VarDict: Inference", {}) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run _do_run(cmd, checks, log_stdout) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 121, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt/disk/WGSDa ta/prealign/syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g 4 /mnt/disk /WGSData/vardict/1/syn3-1_5394707_7174953-raw-regions.bed | testsomatic.R | var2vcf_somatic.pl -N "syn3-tumor|syn3-normal" -f 0.1 | /bcbio/bin/vcfstreamsort | bgzi p -c > /mnt/disk/WGSData/vardict/1/tx/tmpLvDQct/syn3-1_5394707_7174953-raw.vcf.gz samtools: writing to standard output failed: Broken pipe samtools: error closing standard output: -1 Use of uninitialized value $sample in concatenation (.) or string at /bcbio/bin/var2vcf_somatic.pl line 35. /bin/bash: line 1: 25667 Killed /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt/disk/WGSData/preal ign/syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g 4 /mnt/disk/WGSData /vardict/1/syn3-1_5394707_7174953-raw-regions.bed 25668 Done | testsomatic.R 25669 Done | var2vcf_somatic.pl -N "syn3-tumor|syn3-normal" -f 0.1 25670 Done | /bcbio/bin/vcfstreamsort 25671 Done | bgzip -c > /mnt/disk/WGSData/vardict/1/tx/tmpLvDQct/syn3-1_5394707_7174953-raw.vcf.gz ' returned non-zero exit status 137

Hope this error message can help you to isolate the cause of the problem!

Thanks a lot, Michael

mjafin commented 10 years ago

Hi Michael, Difficult to tell what's going wrong, but let's try out a few things:

  1. Are the bam files and bed file all in GRCh37 format (i.e. no chr anywhere)?
  2. If you run this command standalone (from your command line), does it produce a table of values into temp.txt and no errors messages: /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt/disk/WGSData/prealign /syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g 4 /mnt/disk/WGSData/va rdict/1/syn3-1_5394707_7174953-raw-regions.bed > ./temp.txt
  3. Memory use. How much RAM is there on this box? Can you monitor top while running the above standalone vardict command and see if it uses all of the memory on your box?
roryk commented 9 years ago

Hi @michaelxu2014, closing this out because it's pretty stale and the issue might have resolved itself by now. Feel free to reopen if you're still having the problem. Thanks!