bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
992 stars 354 forks source link

subprocess.CalledProcessError: Command 'set -o pipefail; export LC_ALL=en_US.utf8 && export LANG=en_US.utf8 #3049

Closed biosurgeon closed 4 years ago

biosurgeon commented 4 years ago

Hi. I'm in trouble to run bcbio_nextgen. The subjects have multiple tumors with one paired normal sample. It seemed that ensemble calling was completed, but there are some problems from gemini? (I'm not sure because there are multiple files in ../work/gemini/ folders including ensemble vcfs and conf/lua files) or peddy?. I've tried update gemini and other tools and data, but same error occured. I've searched to find similar error (because I solved some problems before), but I have no idea what is wrong.... Could you help me? the error was like below. Thank you!!

======================================== [2019-12-27T20:08Z] multiprocessing: combine_calls [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_001_tu1: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_001_tu2: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_002_tu1: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_002_tu2: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_002_tu3: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_002_tu4: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_002_tu5: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_002_tu6: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_003_tu1: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_003_tu2: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_003_tu3: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_003_tu4: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_003_tu5: vardict,strelka2,mutect2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_001_CN_N-germline: freebayes,gatk-haplotype,strelka2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_002_CN_N-germline: freebayes,gatk-haplotype,strelka2 [2019-12-27T20:08Z] Ensemble consensus calls for GCMET_003_CN_N-germline: freebayes,gatk-haplotype,strelka2 [2019-12-27T20:08Z] Timing: validation summary [2019-12-27T20:08Z] Timing: structural variation [2019-12-27T20:08Z] multiprocessing: detect_sv [2019-12-27T20:08Z] multiprocessing: finalize_sv [2019-12-27T20:08Z] Timing: structural variation [2019-12-27T20:08Z] multiprocessing: detect_sv [2019-12-27T20:08Z] multiprocessing: finalize_sv [2019-12-27T20:08Z] Timing: structural variation ensemble [2019-12-27T20:08Z] Timing: structural variation validation [2019-12-27T20:08Z] multiprocessing: validate_sv [2019-12-27T20:08Z] Timing: heterogeneity [2019-12-27T20:08Z] Timing: population database [2019-12-27T20:08Z] multiprocessing: prep_gemini_db [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_001_CN_liver_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_001_CN_liver_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_001_CN_liver_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_001_CN_liver_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_001_CN_T_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_001_CN_T_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_001_CN_T_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_001_CN_T_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CN_LND1_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CN_LND1_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CN_LND1_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CN_LND1_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CN_T1_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CN_T1_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CN_T1_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CN_T1_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CN_T2_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CN_T2_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CN_T2_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CN_T2_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CT1_liver1_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CT1_liver1_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CT1_liver1_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CT1_liver1_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CT1_liver2_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CT1_liver2_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CT1_liver2_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CT1_liver2_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CT1_liver3_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CT1_liver3_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CT1_liver3_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CT1_liver3_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CN_LN4_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CN_LN4_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CN_LN4_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CN_LN4_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CN_LN6_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CN_LN6_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CN_LN6_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CN_LN6_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CN_T_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CN_T_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CN_T_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CN_T_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CT1_liver1_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CT1_liver1_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CT1_liver1_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CT1_liver1_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CT1_liver2_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CT1_liver2_tu [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CT1_liver2_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CT1_liver2_tu [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_001_CN_N [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_001_CN_N [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_001_CN_N [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_001_CN_N [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_002_CN_N [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CN_N [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CN_N [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_002_CN_N [2019-12-27T20:08Z] Not running gemini, no samples with variants found: GCMET_003_CN_N [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CN_N [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CN_N [2019-12-27T20:08Z] Not running gemini, not configured in tools_on: GCMET_003_CN_N [2019-12-27T20:08Z] Timing: peddy check [2019-12-27T20:08Z] multiprocessing: run_peddy [2019-12-27T20:08Z] Running peddy on /scratch/laki98/GCMET/work/qc/GCMET_001_tu1/peddy/GCMET_001_tu1-effects-annotated-germline.vcf.gz against /scratch/laki98/GCMET/work/qc/GCMET_001_tu1/peddy/GCMET_001_tu1-effects-annotated-germline.ped. [2019-12-27T20:08Z] Uncaught exception occurred Traceback (most recent call last): File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 26, in run _do_run(cmd, checks, log_stdout, env=env) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 106, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; export LC_ALL=en_US.utf8 && export LANG=en_US.utf8 && /home/laki98/nextgen/bin/peddy -p 1 --plot --prefix /scratch/laki98/GCMET/work/bcbiotx/tmpwdha8ail/GCMET_001_tu1 /scratch/laki98/GCMET/work/qc/GCMET_001_tu1/peddy/GCMET_001_tu1-effects-annotated-germline.vcf.gz /scratch/laki98/GCMET/work/qc/GCMET_001_tu1/peddy/GCMET_001_tu1-effects-annotated-germline.ped 2> /scratch/laki98/GCMET/work/bcbiotx/tmpwdha8ail/run-stderr.log ' returned non-zero exit status 1. [2019-12-27T20:08Z] Traceback (most recent call last): File "/home/laki98/nextgen/bin/peddy", line 11, in load_entry_point('peddy==0.4.3', 'console_scripts', 'peddy')() File "/home/laki98/nextgen/anaconda/envs/python2/lib/python2.7/site-packages/pkg_resources/init.py", line 489, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/home/laki98/nextgen/anaconda/envs/python2/lib/python2.7/site-packages/pkg_resources/init.py", line 2852, in load_entry_point return ep.load() File "/home/laki98/nextgen/anaconda/envs/python2/lib/python2.7/site-packages/pkg_resources/init.py", line 2443, in load return self.resolve() File "/home/laki98/nextgen/anaconda/envs/python2/lib/python2.7/site-packages/pkg_resources/init.py", line 2449, in resolve module = import(self.modulename, fromlist=['name'], level=0) File "/home/laki98/nextgen/anaconda/envs/python2/lib/python2.7/site-packages/peddy/main.py", line 3, in from .cli import peddy as cli File "/home/laki98/nextgen/anaconda/envs/python2/lib/python2.7/site-packages/peddy/cli.py", line 17, in from cyvcf2 import VCF File "/home/laki98/nextgen/anaconda/envs/python2/lib/python2.7/site-packages/cyvcf2/init.py", line 1, in from .cyvcf2 import (VCF, Variant, Writer, r as r_unphased, par_relatedness, File "cyvcf2/cyvcf2.pyx", line 19, in init cyvcf2.cyvcf2 File "init.pxd", line 1038, in numpy.import_array ImportError: numpy.core.multiarray failed to import

Traceback (most recent call last): File "/home/laki98/nextgen/anaconda/bin/bcbio_nextgen.py", line 245, in main(kwargs) File "/home/laki98/nextgen/anaconda/bin/bcbio_nextgen.py", line 46, in main run_main(kwargs) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 50, in run_main fc_dir, run_info_yaml) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel for xs in pipeline(config, run_info_yaml, parallel, dirs, samples): File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 190, in variant2pipeline samples = peddy.run_peddy_parallel(samples, run_parallel) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/variation/peddy.py", line 32, in run_peddy_parallel samples = parallel_fn("run_peddy", [[x] for x in to_run]) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(x) for x in items): File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 1003, in call if self.dispatch_one_batch(iterator): File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 834, in dispatch_one_batch self._dispatch(tasks) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 753, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 201, in apply_async result = ImmediateResult(func) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 582, in init self.results = batch() File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 256, in call for func, args, kwargs in self.items] File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 256, in for func, args, kwargs in self.items] File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/utils.py", line 55, in wrapper return f(args, *kwargs) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/distributed/multitasks.py", line 24, in run_peddy return peddy.run_peddy(args) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/variation/peddy.py", line 103, in run_peddy do.run(cmd.format(locals()), message.format(locals())) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 26, in run _do_run(cmd, checks, log_stdout, env=env) File "/home/laki98/nextgen/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 106, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; export LC_ALL=en_US.utf8 && export LANG=en_US.utf8 && /home/laki98/nextgen/bin/peddy -p 1 --plot --prefix /scratch/laki98/GCMET/work/bcbiotx/tmpwdha8ail/GCMET_001_tu1 /scratch/laki98/GCMET/work/qc/GCMET_001_tu1/peddy/GCMET_001_tu1-effects-annotated-germline.vcf.gz /scratch/laki98/GCMET/work/qc/GCMET_001_tu1/peddy/GCMET_001_tu1-effects-annotated-germline.ped 2> /scratch/laki98/GCMET/work/bcbiotx/tmpwdha8ail/run-stderr.log ' returned non-zero exit status 1.

naumenko-sa commented 4 years ago

Hi @biosurgeon!

Sorry about the issues! Could you please post your yaml config file?

Sergey

biosurgeon commented 4 years ago

Hi, @naumenko-sa Thanks for the reply. Please find below information of yaml config file.

details:

Could you help me?

Young

biosurgeon commented 4 years ago

Now I skip peddy and the process is on going well. (I just move the peddy command file into somewhere..) I think the system used numpy in phython3, not in python2, so it seemed the problem was due to python PATH... I tried to update numpy but numpy was installed in python3 only and I also moved the updated numpy folder into python2, but there was another error, (ImportError: cannot import name _distributor_init)

Is there anyone who let me know how I solve this problem? Pipeline without peddy will be OK?

Thanks

naumenko-sa commented 4 years ago

Hi!

Sorry about the delay in responding!

Peddy is one of the quality control tools: https://github.com/brentp/peddy So it is not affecting variant calling. However, peddy works ok in many other projects.

Gemini should not be a problem, as since v1.1.1 we are not creating gemini databases by default.

What version of bcbio are you using? In the last releases many python3/python2 bugs were fixed. What is your output of: bcbio_nextgen.py -v

Sergey

biosurgeon commented 4 years ago

Hi, Sergey,

Thanks for the reply.

The version of bcbio is 1.2.0a.
I updated bcbio (data and tools) several times, as I hoped it will solve the problem.

There are so many pythons in my system (in server), so I wonder it could be a reason...

Do you have any suggestion?

Thanks

Young

[laki98@master config]$ whereis python python: /usr/bin/python2.7 /usr/bin/python2.7-config /usr/bin/python /usr/lib/python2.7 /usr/lib64/python2.7 /etc/python /usr/include/python2.7 /home/laki98/nextgen/anaconda/bin/python3.6-config /home/laki98/nextgen/anaconda/bin/python3.6m /home/laki98/nextgen/anaconda/bin/python /home/laki98/nextgen/anaconda/bin/python3.6 /home/laki98/nextgen/anaconda/bin/python3.6m-config / /usr/share/man/man1/python.1.gz

[laki98@master work]$ which python ~/nextgen/anaconda/bin/python

naumenko-sa commented 4 years ago

Hi Young!

I think it is typical to have many pythons on a system these days. If you have your bcbio/tools/bin and bcbio/anaconda/bin in the PATH you should be fine. Moreover, for the most tools bcbio unsets PATH and makes sure it calls the tool it has on board.

Looking at your yaml config, I see several potential issues:

Sergey

ps. Also it would be super helpful if you attached log files if the simplified bcbio fails as well.

biosurgeon commented 4 years ago

Hi Sergey,

Thanks for the comments and suggestions. I attach the log file below. My PATH includes ../nextgen/anaconda/bin and ../nextgen/bin (you mean this? because there is no bcbio/tools/bin to me.. galaxy, genomes, anaconda folders are belong to ../nextgen/ )...

  1. I ran bcbio with multiple samples (maximun 4 tumors with one paired normal, so max 4 tumors in one batch) in other environment and there was no error.. so I thought it works.. but I'll try to decouple them, and compare the outcome. (I'm not sure how long it will take, but I will post the result if possible)
  2. For other three issues (min_allele_fraction, strelka2, simplifying), I'll try.

Thank you so much!!

Young

log_files.txt

naumenko-sa commented 4 years ago

HI @biosurgeon !

Closing this issue for now. Please feel free to reopen if you are experiencing any issues!

SN