Closed ghost closed 9 years ago
Michael;
What version of bcbio are you running? indelcaller
will only be available in the latest release (0.8.2) or development version. You can upgrade with bcbio_nextgen.py upgrade -u development
or bcbio_nextgen.py upgrade -u stable
.
Also we only support a single indelcaller to pair with mutect, so the double specification will not work. If your pind
specification is meant to use pindel
, that functionality is so alpha it's only a pull request and not yet integrated. We're working on adding and validating it, so I wouldn't recommend using it in any kind of production calling yet.
Hope this helps.
Hi Brad,
Thank you so much! I am running bcbio v0.8.1 and have upgraded to the latest version. OK, I will try to run mutect + sid to call somatic variants. Hopefully the below yaml works.
Hi Michael, For SID to work, you will have to have Appistry MuTect installed via the toolplus option, otherwise it won't be used. The SID indels are pretty bad and completely unfiltered.
Hi Miika,
Thank you for your help! Do you have any suggestions on the indel caller for mutect? I was tried scalpel indel caller with mutect, but I found it took almost one day to finish just one chromosome. I tried freebayes and varscan, and they are pretty fast. Now I am trying vardict and it is fast so far. I have no idea which indel caller is suitable for mutect. Your suggestion would be greatly appreciated!
Thanks, Michael
Hi Michael, We've seen some evidence that Mutect is really good as a SNP caller (low FP rate) and vardict is an excellent indel caller (it produces SNPs too but FP rates have been higher than with Mutect). So you could just get SNPs from Mutect and indels from vardict as a post-processing step. Or if you feel adventurous, update to the latest developmental version and use Pindel as an indel caller bunched with Mutect.
Hi Miika,
I am trying to run vardict and I got the below error. I reproduced this issue on two different machines.
[2014-09-23 16:53] Genotyping with VarDict: Inference
[2014-09-23 16:53] samtools: writing to standard output failed: Broken pipe
[2014-09-23 16:53] samtools: error closing standard output: -1
Traceback (most recent call last):
File "/bcbio/bin/bcbio_nextgen.py", line 216, in
Do you have any suggestions? Thanks!
Michael
Hi Michael, Could you try running the process in single thread local mode and report the output (if you still get the error)?
Hi Miika,
The error is still there,
[2014-09-23 17:39] Timing: coverage [2014-09-23 17:39] Resource requests: gatk, picard, vardict; memory: 3.5, 3.5; cores: 1, 1, 1 [2014-09-23 17:39] Configuring 1 jobs to run, using 1 cores each with 3.8g of memory reserved for each job [2014-09-23 17:39] Timing: alignment post-processing [2014-09-23 17:39] multiprocessing: piped_bamprep [2014-09-23 17:39] Timing: variant calling [2014-09-23 17:39] multiprocessing: variantcall_sample [2014-09-23 17:39] Genotyping with VarDict: Inference [2014-09-23 17:39] samtools: writing to standard output failed: Broken pipe [2014-09-23 17:39] samtools: error closing standard output: -1 [2014-09-23 17:39] samtools: writing to standard output failed: Broken pipe [2014-09-23 17:39] samtools: error closing standard output: -1 [2014-09-23 17:39] samtools: writing to standard output failed: Broken pipe [2014-09-23 17:39] samtools: error closing standard output: -1 [2014-09-23 17:40] samtools: writing to standard output failed: Broken pipe [2014-09-23 17:40] samtools: error closing standard output: -1 [2014-09-23 17:40] samtools: writing to standard output failed: Broken pipe
Hi Michael, You can ignore the samtools warnings with VarDict for as long as the run doesn't fail. They're coming from reading a line from samtools and not closing the stream properly. Nothing to worry about.
Hi Miika,
Thanks! Actually, bcbio cannot finish the analysis with vardict due to the samtools error. After a while, the samtools error will kill the process. Please see the below error message,
[2014-09-23 16:53] Genotyping with VarDict: Inference [2014-09-23 16:53] samtools: writing to standard output failed: Broken pipe [2014-09-23 16:53] samtools: error closing standard output: -1 Traceback (most recent call last): File "/bcbio/bin/bcbio_nextgen.py", line 216, in main(kwargs) File "/bcbio/bin/bcbio_nextgen.py", line 42, in main run_main(kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 36, in run_main fc_dir, run_info_yaml) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 82, in run_toplevel for xs in pipeline.run(config, config_file, parallel, dirs, pipeline_items): File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 195, in run samples = genotype.parallel_variantcall_region(samples, run_parallel) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/genotype.py", line 157, in parallel_variantcall_reg ion "vrn_file", ["region", "sam_ref", "config"])) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 32, in grouped_parallel_split_com bine final_output = parallel_fn(parallel_name, split_args) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 84, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 660, in _call self.retrieve() File "/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 543, in retrieve raise exception_type(report) TypeError: init() takes at least 3 arguments (2 given)
Besides, I run the bcbio on a single machine but multiple thread mode when I got the error message.
Michael;
Sorry about the issues. Miika is pointing out that both of the messages your seeing don't indiciate the root cause of the failure. The samtools: error closing standard output: -1
message comes from VarDict but is a "normal" message and happens when you pipe samtools but only take the start of the output.
The TypeError: init() takes at least 3 arguments (2 given)
is a generic messaging from multiprocessing when a run fails, so also doesn't help us with the root cause.
When you run on a single core (-n 1
) what error message do you get on exit? That might help us isolate the cause of the problem. Thanks for the patience debugging.
Hi Brad,
OK, I see. When I run bcbio on a single core the error message was about samtools broken pipe error. But I did not let it finish and then killed it. So I will try to rerun it again tomorrow and keep you updated.
Besides, I am trying to run the bcbio development version with mutect + pindel. The run is finished but I did not see any results generated by pindel. I am not sure if my yaml file is not correct, please take a look the below yaml file. ...
If it is not correct please let me know. Thanks!
Michael
Hi Micheal, I had also a failure with VarDict but it was related to an error in R. Can you check that when you start R prior to running bcbio it starts properly?
Pindel integration seems to still be at the pull request level so not yet in the latest developmental version: https://github.com/chapmanb/bcbio-nextgen/pull/602
Hi Miika,
Yes, R run very well on my machine.
Thanks, Guorong
Hi Brad,
When I run bcbio on a single node (-n 1) and I got the following error message,
[2014-09-24 22:55] Genotyping with VarDict: Inference
[2014-09-24 22:55] samtools: writing to standard output failed: Broken pipe
[2014-09-24 22:55] samtools: error closing standard output: -1
[2014-09-24 23:06] Use of uninitialized value $sample in concatenation (.) or string at /bcbio/bin/var2vcf_somatic.pl line 35.
[2014-09-24 23:06] /bin/bash: line 1: 25667 Killed /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt
/disk/WGSData/prealign/syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g
4 /mnt/disk/WGSData/vardict/1/syn3-1_5394707_7174953-raw-regions.bed
[2014-09-24 23:06] Uncaught exception occurred
Traceback (most recent call last):
File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run
_do_run(cmd, checks, log_stdout)
File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 121, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt/disk/WGSData/prealign
/syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g 4 /mnt/disk/WGSData/va
rdict/1/syn3-1_5394707_7174953-raw-regions.bed | testsomatic.R | var2vcf_somatic.pl -N "syn3-tumor|syn3-normal" -f 0.1 | /bcbio/bin/vcfstreamsort | bgzip -c > /mnt
/disk/WGSData/vardict/1/tx/tmpLvDQct/syn3-1_5394707_7174953-raw.vcf.gz
samtools: writing to standard output failed: Broken pipe
samtools: error closing standard output: -1
Use of uninitialized value $sample in concatenation (.) or string at /bcbio/bin/var2vcf_somatic.pl line 35.
/bin/bash: line 1: 25667 Killed /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt/disk/WGSData/preal
ign/syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g 4 /mnt/disk/WGSData
/vardict/1/syn3-1_5394707_7174953-raw-regions.bed
25668 Done | testsomatic.R
25669 Done | var2vcf_somatic.pl -N "syn3-tumor|syn3-normal" -f 0.1
25670 Done | /bcbio/bin/vcfstreamsort
25671 Done | bgzip -c > /mnt/disk/WGSData/vardict/1/tx/tmpLvDQct/syn3-1_5394707_7174953-raw.vcf.gz
' returned non-zero exit status 137
Traceback (most recent call last):
File "/bcbio/bin/bcbio_nextgen.py", line 216, in
Hope this error message can help you to isolate the cause of the problem!
Thanks a lot, Michael
Hi Michael, Difficult to tell what's going wrong, but let's try out a few things:
chr
anywhere)?temp.txt
and no errors messages: /bcbio/bin/vardict -G /bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N syn3-tumor -b "/mnt/disk/WGSData/prealign /syn3-tumor/2_2014-09-08_mutectdata-sort.bam|/mnt/disk/WGSData/prealign/syn3-normal/1_2014-09-08_mutectdata-sort.bam" -z -F -c 1 -S 2 -E 3 -g 4 /mnt/disk/WGSData/va rdict/1/syn3-1_5394707_7174953-raw-regions.bed > ./temp.txt
top
while running the above standalone vardict command and see if it uses all of the memory on your box?Hi @michaelxu2014, closing this out because it's pretty stale and the issue might have resolved itself by now. Feel free to reopen if you're still having the problem. Thanks!
Hi,
I am trying to add indelcaller to my yaml file, but I got the error message,
Traceback (most recent call last): File "/bcbio/bin/bcbio_nextgen.py", line 216, in
[2014-09-22 19:00] Checking sample YAML configuration: /home/WGSData/bcbio_sample.yaml
main(kwargs)
File "/bcbio/bin/bcbio_nextgen.py", line 42, in main
run_main(kwargs)
File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 35, in run_main
fc_dir, run_info_yaml)
File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 73, in _run_toplevel
samples = run_info.organize(dirs, config, run_info_yaml)
File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 34, in organize
run_details = _run_info_from_yaml(dirs["flowcell"], run_info_yaml, config)
File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 415, in _run_info_from_ya
ml
_check_sample_config(run_details, run_info_yaml)
File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 304, in _check_sample_con
fig
[_check_algorithm_keys(x) for x in items]
File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 196, in _checkalgorithm
keys
% (problem_keys, url))
ValueError: Unexpected configuration keyword in 'algorithm' section: ['indelcaller']
See configuration documentation for supported options:
https://bcbio-nextgen.readthedocs.org/en/latest/contents/configuration.html#algorithm-parameters
I want to run mutect only for SNP calling and other indel callers pind and sid for indel calling. I am not sure if I added indelcaller to wrong place in my yaml file. Please take a look and thanks a lot!