bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 353 forks source link

problem with run_tests speed=2 #435

Closed bosmont closed 10 years ago

bosmont commented 10 years ago

I installed gemini by doing: bcbio_nextgen.py upgrade --tools --toolplus data

but when running ./run_test2 speed=2, I got the following error:

[2014-05-31 09:13] Gemini cannot open this annotation file: /usr/local/share/bcbio-nextgen/gemini_data/hg19.gwas.bed.gz. [2014-05-31 09:13] Have you installed the annotation files? If so, have they been moved or deleted? Exiting... [2014-05-31 09:13] [2014-05-31 09:13] For more details: [2014-05-31 09:13] http://gemini.readthedocs.org/en/latest/content/#installation.html\#installing-annotation-files [2014-05-31 09:13] [2014-05-31 09:13] Uncaught exception occurred

chapmanb commented 10 years ago

It seems like you have a gemini installation without any data. If you do ls -lh /usr/local/share/bcbio-nextgen/gemini_data do you have any data files there? You should be able to update gemini data in place with: gemini update --data. Hope this fixes it.

bosmont commented 10 years ago

Thanks for your help, It did help. The test run further, but then another error:

/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/io/excel.py:626: UserWarning: Installed openpyxl is not supported at this time. Use >=1.6.1 and <2.0.0. .format(openpyxl_compat.start_ver, openpyxl_compat.stop_ver)) [2014-05-31 20:30] Generating summary files: ['', 'c-tumor2'] [Sat May 31 20:30:09 EDT 2014] net.sf.picard.sam.BamIndexStats INPUT=/home/mango/work/bcbio/bcbio-nextgen/tests/test_automated_output/align/c-tumor2/3_130728_tcancer-sort.bam VALIDATION_STRINGENCY=SILENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Sat May 31 20:30:09 EDT 2014] Executing as mango@mango-N56JR on Linux 3.13.0-27-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_55-b14; Picard version: 1.96(1510) [Sat May 31 20:30:09 EDT 2014] net.sf.picard.sam.BamIndexStats done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=753926144 Traceback (most recent call last): File "/usr/local/bin/bcbio_nextgen.py", line 62, in main(kwargs) File "/usr/local/bin/bcbio_nextgen.py", line 40, in main run_main(kwargs) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 39, in run_main fc_dir, run_info_yaml) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 82, in _run_toplevel for xs in pipeline.run(config, config_file, parallel, dirs, pipeline_items): File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 360, in run samples = qcsummary.generate_parallel(samples, run_parallel) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 35, in generate_parallel sum_samples = run_parallel("pipeline_summary", samples) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 84, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 644, in call self.dispatch(function, args, kwargs) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 391, in dispatch job = ImmediateApply(func, args, kwargs) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 129, in init self.results = func(_args, _kwargs) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 47, in wrapper return apply(f, _args, _kwargs) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 57, in pipeline_summary return qcsummary.pipeline_summary(*args) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 52, in pipeline_summary data["summary"] = _run_qc_tools(work_bam, data) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 95, in _run_qc_tools cur_metrics = qc_fn(bam_file, data, cur_qc_dir) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 266, in _run_fastqc if data.get("analysis", "").lower() not in ["standard"] File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/bam/init.py", line 67, in downsample ds_pct = get_downsample_pct(broad_runner, in_bam, target_counts) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/bam/init.py", line 57, in get_downsample_pct n_rgs = max(1, len(work_bam.header["RG"])) File "csamtools.pyx", line 1123, in csamtools.Samfile.header.get (pysam/csamtools.c:11562) ValueError: unknown field code 'PP' in record 'PG' ERROR

ERROR: Test paired tumor-normal calling using multiple calling approaches: MuTect, VarScan, FreeBayes.

Traceback (most recent call last): File "/home/mango/work/bcbio/bcbio-nextgen/tests/test_automated_analysis.py", line 273, in test_7_cancer subprocess.check_call(cl) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['bcbio_nextgen.py', '/home/mango/work/bcbio/bcbio-nextgen/tests/test_automated_output/bcbio_system.yaml', '/home/mango/work/bcbio/bcbio-nextgen/tests/data/automated/run_info-cancer.yaml']' returned non-zero exit status 1


Ran 2 tests in 152.954s

FAILED (errors=1)

chapmanb commented 10 years ago

It looks like the specific issue is that you have an old version of pysam, since support for this was added in 0.7.1. /usr/local/share/bcbio-nextgen/anaconda/bin/conda list pysam will tell you the version. You can update manually with:

/usr/local/share/bcbio-nextgen/anaconda/bin/conda install -c https://conda.binstar.org/collections/chapmanb/bcbio  pysam

More generally it seems like something is wrong with your install since this should have gotten updated with a bcbio_nextgen.py upgrade and it's unexpected to have a missing gemini data directory. It might be worth a re-update if you run into other issues. Hope this helps.

bosmont commented 10 years ago

You are the best. It works now. Thanks so much!

YiChingTang commented 9 years ago

hi, it looks like I have the same error here: but the differences is I checked the path and the annotation file gemini want was there!

[yiching@bcbio work]$ ls -rlht /opt/bcbio/gemini_data/ (just list part of files) -rw-r--r-- 1 root root 3.1G Jul 4 18:26 ExAC.r0.3.sites.vep.tidy.vcf.gz -rw-r--r-- 1 root root 854K Jul 4 18:26 ExAC.r0.3.sites.vep.tidy.vcf.gz.tbi -rw-r--r-- 1 root root 35G Jul 4 21:57 whole_genome_SNVs.tsv.compressed.gz drwxr-xr-x 2 root root 4.0K Jul 4 21:57 tmpdownload

[yiching@bcbio work]$ tail bcbio.wholegenome.vairant.log raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; /opt/bcbio/tools/bin/gemini load --passonly --skip-gerp-bp -v /trinity/home/yiching/project/testRun_bcbio/whole_genome_final/work/gemini/NA12878-gatk-decompose-effects.vcf.gz -t snpEff --cores 1 --tempdir /trinity/home/yiching/project/testRun_bcbio/whole_genome_final/work/gemini/tx/tmpnpC3An /trinity/home/yiching/project/testRun_bcbio/whole_genome_final/work/gemini/tx/tmpnpC3An/NA12878-gatk.db CADD scores are being loaded (to skip use:--skip-cadd). Gemini cannot open this annotation file: /opt/bcbio/gemini_data/whole_genome_SNVs.tsv.compressed.gz. Have you installed the annotation files? If so, have they been moved or deleted? Exiting...

For more details: http://gemini.readthedocs.org/en/latest/content/#installation.html\#installing-annotation-files

' returned non-zero exit status 1

anyway, i did update thing and run test as suggested above. the gemini can not annotation file error problem didn't solved. Also, I have two fails for the test run:

  1. [2015-07-08T00:26Z] Using input YAML configuration: /trinity/home/yiching/bcbio-nextgen/tests/data/automated/run_info-bamclean.yaml Traceback (most recent call last): File "/opt/bcbio/tools/bin/bcbio_nextgen.py", line 226, in main(kwargs) File "/opt/bcbio/tools/bin/bcbio_nextgen.py", line 43, in main run_main(kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 37, in run_main fc_dir, run_info_yaml) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 80, in _run_toplevel for xs in pipeline.run(config, run_info_yaml, parallel, dirs, samples): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 139, in run [x[0]["description"] for x in samples]]]) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 657, in call self.dispatch(function, args, kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 404, in dispatch job = ImmediateApply(func, args, kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 142, in init self.results = func(_args, _kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 49, in wrapper return apply(f, _args, _kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 199, in organize_samples return run_info.organize(*args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 40, in organize run_details = _run_info_from_yaml(dirs, run_info_yaml, config, sample_names) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 537, in _run_info_from_yaml upload["dir"] = _file_to_abs(upload["dir"], [dirs.get("work")], makedir=True) KeyError: 'dir' ERROR
  2. [2015-07-08T00:29Z] Uncaught exception occurred Traceback (most recent call last): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /opt/bcbio/tools/bin/gemini load --passonly --skip-gene-tables --test-mode --skip-gerp-bp -v /trinity/home/yiching/bcbio-nextgen/tests/test_automated_output/gemini/PairedBatch-varscan-decompose-effects.vcf.gz -t snpEff --cores 1 --tempdir /trinity/home/yiching/bcbio-nextgen/tests/test_automated_output/gemini/tx/tmpaslXnC /trinity/home/yiching/bcbio-nextgen/tests/test_automated_output/gemini/tx/tmpaslXnC/PairedBatch-varscan.db CADD scores are being loaded (to skip use:--skip-cadd). Gemini cannot open this annotation file: /opt/bcbio/gemini_data/whole_genome_SNVs.tsv.compressed.gz. Have you installed the annotation files? If so, have they been moved or deleted? Exiting...

For more details: http://gemini.readthedocs.org/en/latest/content/#installation.html\#installing-annotation-files

' returned non-zero exit status 1 Traceback (most recent call last): File "/opt/bcbio/tools/bin/bcbio_nextgen.py", line 226, in main(kwargs) File "/opt/bcbio/tools/bin/bcbio_nextgen.py", line 43, in main run_main(kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 37, in run_main fc_dir, run_info_yaml) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 80, in _run_toplevel for xs in pipeline.run(config, run_info_yaml, parallel, dirs, samples): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 190, in run samples = population.prep_db_parallel(samples, run_parallel) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/population.py", line 283, in prep_db_parallel output = parallel_fn("prep_gemini_db", to_process) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 657, in call self.dispatch(function, args, kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 404, in dispatch job = ImmediateApply(func, args, kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 142, in init self.results = func(_args, _kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 49, in wrapper return apply(f, _args, _kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 143, in prep_gemini_db return population.prep_gemini_db(*args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/population.py", line 36, in prep_gemini_db gemini_db = create_gemini_db(gemini_vcf, data, gemini_db, ped_file) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/population.py", line 76, in create_gemini_db do.run(cmd, "Create gemini database for %s" % gemini_vcf, data) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; /opt/bcbio/tools/bin/gemini load --passonly --skip-gene-tables --test-mode --skip-gerp-bp -v /trinity/home/yiching/bcbio-nextgen/tests/test_automated_output/gemini/PairedBatch-varscan-decompose-effects.vcf.gz -t snpEff --cores 1 --tempdir /trinity/home/yiching/bcbio-nextgen/tests/test_automated_output/gemini/tx/tmpaslXnC /trinity/home/yiching/bcbio-nextgen/tests/test_automated_output/gemini/tx/tmpaslXnC/PairedBatch-varscan.db CADD scores are being loaded (to skip use:--skip-cadd). Gemini cannot open this annotation file: /opt/bcbio/gemini_data/whole_genome_SNVs.tsv.compressed.gz. Have you installed the annotation files? If so, have they been moved or deleted? Exiting...

For more details: http://gemini.readthedocs.org/en/latest/content/#installation.html\#installing-annotation-files

' returned non-zero exit status 1 ERROR

ERROR: Clean problem BAM input files that do not require alignment.

Traceback (most recent call last): File "/trinity/home/yiching/bcbio-nextgen/tests/test_automated_analysis.py", line 287, in test_6_bamclean subprocess.check_call(cl) File "/opt/bcbio/anaconda/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['bcbio_nextgen.py', '/trinity/home/yiching/bcbio-nextgen/tests/test_automated_output/bcbio_system.yaml', '/trinity/home/yiching/bcbio-nextgen/tests/data/automated/../100326_FC6107FAAXX', '/trinity/home/yiching/bcbio-nextgen/tests/data/automated/run_info-bamclean.yaml']' returned non-zero exit status 1

ERROR: Test paired tumor-normal calling using multiple calling approaches: MuTect, VarScan, FreeBayes.

Traceback (most recent call last): File "/trinity/home/yiching/bcbio-nextgen/tests/test_automated_analysis.py", line 300, in test_7_cancer subprocess.check_call(cl) File "/opt/bcbio/anaconda/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['bcbio_nextgen.py', '/trinity/home/yiching/bcbio-nextgen/tests/test_automated_output/bcbio_system.yaml', '/trinity/home/yiching/bcbio-nextgen/tests/data/automated/run_info-cancer.yaml']' returned non-zero exit status 1


Ran 2 tests in 225.191s

FAILED (errors=2)

chapmanb commented 9 years ago

Sorry about the issues with the tests. I fixed one small issue causing the error with setting up the run info YAML file for a minimal test with no directory in the upload, but the larger GEMINI issue does look like you're missing some files. whole_genome_SNVs.tsv.compressed.gz should also have an index (.tbi) file associated with it, which is missing. When you run gemini update --dataonly, does it finish cleanly and retrieve this file? Hope this fixes it for you.

YiChingTang commented 9 years ago

thank you for the quick reply. Nope, still no (.tbi) for the whole_genome_SNVs.tsv.compressed.gz file after updating

chapmanb commented 9 years ago

Sorry I'm a bit confused as why that would be the case. If things work it should always grab an index along with bgzipped files. Could you provide the output of:

gemini update --dataonly --extra cadd_score

and I can see if I can spot anything. Hope this helps.

YiChingTang commented 9 years ago

thanks! the output shown as below:

Checking required dependencies... curl found

Gemini data files updated

chapmanb commented 9 years ago

Thanks, and after running this if you do:

ls -lh /opt/bcbio/gemini_data/whole_genome_SNVs*

do you still not see the associated .tbi file? Do you still get the same error when running the tests?

YiChingTang commented 9 years ago

the .tbi shows up after ls -lh /opt/gemini_data/whole_genome_SNVs*

No, I didn't get the same error when running the tests. Instead, I get an error saying that "index out of range" (which shown below)

[2015-07-11T09:32Z] Create gemini database for /opt/bcbio/bcbio-nextgen/tests/test_automated_output/gemini/PairedBatch-varscan-decompose-effects.vcf.gz : c-tumor [2015-07-11T09:32Z] CADD scores are being loaded (to skip use:--skip-cadd). [2015-07-11T09:32Z] Traceback (most recent call last): [2015-07-11T09:32Z] File "/opt/bcbio/tools/bin/gemini", line 6, in [2015-07-11T09:32Z] gemini.gemini_main.main() [2015-07-11T09:32Z] File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1121, in main [2015-07-11T09:32Z] args.func(parser, args) [2015-07-11T09:32Z] File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 196, in load_fn [2015-07-11T09:32Z] gemini_load.load(parser, args) [2015-07-11T09:32Z] File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 52, in load [2015-07-11T09:32Z] load_singlecore(args) [2015-07-11T09:32Z] File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 66, in load_singlecore [2015-07-11T09:32Z] gemini_loader.populate_from_vcf() [2015-07-11T09:32Z] File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 166, in populate_from_vcf [2015-07-11T09:32Z] database.insert_variation(self.c, self.var_buffer) [2015-07-11T09:32Z] File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/database.py", line 355, in insert_variation [2015-07-11T09:32Z] qs = ",".join(["?"] * len(buffer[0])) [2015-07-11T09:32Z] IndexError: list index out of range [2015-07-11T09:32Z] Uncaught exception occurred Traceback (most recent call last): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /opt/bcbio/tools/bin/gemini load --passonly --skip-gene-tables --test-mode --skip-gerp-bp -v /opt/bcbio/bcbio-nextgen/tests/test_automated_output/gemini/PairedBatch-varscan-decompose-effects.vcf.gz -t snpEff --cores 1 --tempdir /opt/bcbio/bcbio-nextgen/tests/test_automated_output/gemini/tx/tmpfx2zil /opt/bcbio/bcbio-nextgen/tests/test_automated_output/gemini/tx/tmpfx2zil/PairedBatch-varscan.db CADD scores are being loaded (to skip use:--skip-cadd). Traceback (most recent call last): File "/opt/bcbio/tools/bin/gemini", line 6, in gemini.gemini_main.main() File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1121, in main args.func(parser, args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 196, in load_fn gemini_load.load(parser, args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 52, in load load_singlecore(args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 66, in load_singlecore gemini_loader.populate_from_vcf() File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 166, in populate_from_vcf database.insert_variation(self.c, self.var_buffer) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/database.py", line 355, in insert_variation qs = ",".join(["?"] * len(buffer[0])) IndexError: list index out of range ' returned non-zero exit status 1 Traceback (most recent call last): File "/opt/bcbio/tools/bin/bcbio_nextgen.py", line 226, in main(kwargs) File "/opt/bcbio/tools/bin/bcbio_nextgen.py", line 43, in main run_main(kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 37, in run_main fc_dir, run_info_yaml) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 80, in _run_toplevel for xs in pipeline.run(config, run_info_yaml, parallel, dirs, samples): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 190, in run samples = population.prep_db_parallel(samples, run_parallel) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/population.py", line 283, in prep_db_parallel output = parallel_fn("prep_gemini_db", to_process) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 657, in call self.dispatch(function, args, kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 404, in dispatch job = ImmediateApply(func, args, kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 142, in init self.results = func(_args, _kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 49, in wrapper return apply(f, _args, _kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 143, in prep_gemini_db return population.prep_gemini_db(args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/population.py", line 36, in prep_gemini_db gemini_db = create_gemini_db(gemini_vcf, data, gemini_db, ped_file) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/population.py", line 76, in create_gemini_db do.run(cmd, "Create gemini database for %s" % gemini_vcf, data) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; /opt/bcbio/tools/bin/gemini load --passonly --skip-gene-tables --test-mode --skip-gerp-bp -v /opt/bcbio/bcbio-nextgen/tests/test_automated_output/gemini/PairedBatch-varscan-decompose-effects.vcf.gz -t snpEff --cores 1 --tempdir /opt/bcbio/bcbio-nextgen/tests/test_automated_output/gemini/tx/tmpfx2zil /opt/bcbio/bcbio-nextgen/tests/test_automated_output/gemini/tx/tmpfx2zil/PairedBatch-varscan.db CADD scores are being loaded (to skip use:--skip-cadd). Traceback (most recent call last): File "/opt/bcbio/tools/bin/gemini", line 6, in gemini.gemini_main.main() File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1121, in main args.func(parser, args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 196, in load_fn gemini_load.load(parser, args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 52, in load load_singlecore(args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 66, in load_singlecore gemini_loader.populate_from_vcf() File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 166, in populate_from_vcf database.insert_variation(self.c, self.var_buffer) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/gemini/database.py", line 355, in insert_variation qs = ",".join(["?"] \ len(buffer[0])) IndexError: list index out of range ' returned non-zero exit status 1 ERROR

ERROR: Clean problem BAM input files that do not require alignment.

Traceback (most recent call last): File "/opt/bcbio/bcbio-nextgen/tests/test_automated_analysis.py", line 287, in test_6_bamclean subprocess.check_call(cl) File "/opt/bcbio/anaconda/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['bcbio_nextgen.py', '/opt/bcbio/bcbio-nextgen/tests/test_automated_output/bcbio_system.yaml', '/opt/bcbio/bcbio-nextgen/tests/data/automated/../100326_FC6107FAAXX', '/opt/bcbio/bcbio-nextgen/tests/data/automated/run_info-bamclean.yaml']' returned non-zero exit status 1

ERROR: Test paired tumor-normal calling using multiple calling approaches: MuTect, VarScan, FreeBayes.

Traceback (most recent call last): File "/opt/bcbio/bcbio-nextgen/tests/test_automated_analysis.py", line 300, in test_7_cancer subprocess.check_call(cl) File "/opt/bcbio/anaconda/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['bcbio_nextgen.py', '/opt/bcbio/bcbio-nextgen/tests/test_automated_output/bcbio_system.yaml', '/opt/bcbio/bcbio-nextgen/tests/data/automated/run_info-cancer.yaml']' returned non-zero exit status 1


Ran 2 tests in 316.751s

FAILED (errors=2)

chapmanb commented 9 years ago

That the update fixed the other issue and sorry about the additional problems. This looks like a small problem in the latest GEMINI handling empty variant files with all filtered calls. I pushed a fix so if you update GEMINI to the latest development version with:

gemini update --devel

the tests should hopefully pass cleanly. Thanks again for all the reports and hope this fixes everything for you.

YiChingTang commented 9 years ago

thanks again! I try the gemini update --devel and re-download the bcbio-nextgen test suite. There only 1 failure remained. Oh, btw, the "dir" ERROR appears again.

below are the msg return from ./run_test.sh seed=2 (only paste the problematic ones)

[2015-07-13T06:42Z] Resource requests: sambamba, samtools; memory: 2.00, 2.00; cores: 16, 16 [2015-07-13T06:42Z] Configuring 1 jobs to run, using 1 cores each with 2.00g of memory reserved for each job [2015-07-13T06:42Z] Timing: organize samples [2015-07-13T06:42Z] multiprocessing: organize_samples [2015-07-13T06:42Z] Using input YAML configuration: /opt/bcbio/bcbio-nextgen/tests/data/automated/run_info-bamclean.yaml Traceback (most recent call last): File "/opt/bcbio/tools/bin/bcbio_nextgen.py", line 226, in main(kwargs) File "/opt/bcbio/tools/bin/bcbio_nextgen.py", line 43, in main run_main(kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 37, in run_main fc_dir, run_info_yaml) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 80, in _run_toplevel for xs in pipeline.run(config, run_info_yaml, parallel, dirs, samples): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 139, in run [x[0]["description"] for x in samples]]]) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 657, in call self.dispatch(function, args, kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 404, in dispatch job = ImmediateApply(func, args, kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 142, in init self.results = func(_args, _kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 49, in wrapper return apply(f, _args, _kwargs) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 199, in organize_samples return run_info.organize(*args) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 40, in organize run_details = _run_info_from_yaml(dirs, run_info_yaml, config, sample_names) File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 537, in _run_info_from_yaml upload["dir"] = _file_to_abs(upload["dir"], [dirs.get("work")], makedir=True) KeyError: 'dir' ERROR Test paired tumor-normal calling using multiple calling approaches: MuTect, VarScan, FreeBayes. ... [2015-07-13T06:42Z] Resource requests: bwa, sambamba, samtools; memory: 2.00, 2.00; cores: 16, 16, 16

ERROR: Clean problem BAM input files that do not require alignment.

Traceback (most recent call last): File "/opt/bcbio/bcbio-nextgen/tests/test_automated_analysis.py", line 287, in test_6_bamclean subprocess.check_call(cl) File "/opt/bcbio/anaconda/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['bcbio_nextgen.py', '/opt/bcbio/bcbio-nextgen/tests/test_automated_output/bcbio_system.yaml', '/opt/bcbio/bcbio-nextgen/tests/data/automated/../100326_FC6107FAAXX', '/opt/bcbio/bcbio-nextgen/tests/data/automated/run_info-bamclean.yaml']' returned non-zero exit status 1


Ran 2 tests in 295.586s

FAILED (errors=1)

chapmanb commented 9 years ago

Glad we're getting further along and thanks again for your patience. It looks like you don't have the latest development version of bcbio with the fix for the dir issue. If you do bcbio_nextgen.py upgrade -u development that should hopefully fix that. Hope that gets it working for you.

YiChingTang commented 9 years ago

there's reason that i don't use the -u development option. Because I always get the error like this:

DBG [config.py]: Using config file /opt/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/../contrib/flavor/ngspipeline minimal/perl-libs.yaml INFO: Reading /opt/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/../contrib/flavor/ngs_pipeline_minimal/perl-libs.yam l DBG [shared.py]: Packages to install: Encode::Locale,Statistics::Descriptive,Archive::Extract,Archive::Zip,Archive::T ar,Compress::Raw::Zlib,DBI,LWP::Simple,LWP::Protocol::https,Time::HiRes,IPC::Cmd,IPC::System::Simple,Params::Check,Mo dule::Load::Conditional,Archive::Tar,File::Fetch,File::ShareDir,File::ShareDir::Install,Bio::DB::Sam;--config lddlfla gs=-shared;SAMTOOLS={system_install}/share/samtools-0.1,Vcf==0.953==https://github.com/chapmanb/vcftools-cpan/archive /v0.953.tar.gz [localhost] local: /opt/bcbio/tools/bin/cpanm -i --notest --local-lib=/opt/bcbio/tools 'Encode::Locale' Encode::Locale is up to date. (1.05) Expiring 443 work directories. This might take a while... [localhost] local: /opt/bcbio/tools/bin/cpanm -i --notest --local-lib=/opt/bcbio/tools 'Statistics::Descriptive' Statistics::Descriptive is up to date. (3.0609) [localhost] local: /opt/bcbio/tools/bin/cpanm -i --notest --local-lib=/opt/bcbio/tools 'Archive::Extract' --> Working on Archive::Extract Fetching http://www.cpan.org/authors/id/B/BI/BINGOS/Archive-Extract-0.76.tar.gz ... FAIL ! Download http://www.cpan.org/authors/id/B/BI/BINGOS/Archive-Extract-0.76.tar.gz failed. Retrying ... ! Download http://www.cpan.org/authors/id/B/BI/BINGOS/Archive-Extract-0.76.tar.gz failed. Retrying ... ! Download http://www.cpan.org/authors/id/B/BI/BINGOS/Archive-Extract-0.76.tar.gz failed. Retrying ... ! Failed to download http://www.cpan.org/authors/id/B/BI/BINGOS/Archive-Extract-0.76.tar.gz ! Failed to fetch distribution Archive-Extract-0.76

Fatal error: local() encountered an error (return code 1) while executing ' /opt/bcbio/tools/bin/cpanm -i --notest -- local-lib=/opt/bcbio/tools 'Archive::Extract''

Aborting.

chapmanb commented 9 years ago

Sorry about the problem. That file downloads fine for me here so I'm not exactly sure what to suggest. Is it possible you have proxy or a firewall setup that blocks downloading from CPAN? Can you download the file manually outside of bcbio? Hope this helps some.

YiChingTang commented 9 years ago

​I ran the process again, and no CPAN issue this time. Instead, '/opt/bcbio/tools/bin/brew update' error show up. was that also a firewall problem?

error: unknown option for 'stash save': --include-untracked To provide a message, use git stash save -- '--include-untracked' Usage: git stash list [] or: git stash show [] or: git stash drop [-q|--quiet] [] or: git stash ( pop | apply ) [--index] [-q|--quiet] [] or: git stash branch [] or: git stash [save [--patch] [-k|--[no-]keep-index] [-q|--quiet] []] or: git stash clear Error: Failure while executing: git stash save --include-untracked --quiet

Fatal error: local() encountered an error (return code 1) while executing '/opt/bcbio/tools/bin/brew update'

Aborting. Upgrading bcbio-nextgen to latest development version Upgrade of bcbio-nextgen development code complete. Upgrading third party tools to latest versions Setting up virtual machine [localhost] local: echo $HOME [localhost] local: uname -m ​

Sincerely, Yiching Tang

2015-07-13 18:41 GMT+08:00 Brad Chapman notifications@github.com:

Sorry about the problem. That file downloads fine for me here so I'm not exactly sure what to suggest. Is it possible you have proxy or a firewall setup that blocks downloading from CPAN? Can you download the file manually outside of bcbio? Hope this helps some.

— Reply to this email directly or view it on GitHub https://github.com/chapmanb/bcbio-nextgen/issues/435#issuecomment-120888115 .

chapmanb commented 9 years ago

Yiching; It looks like you might have an older version of git on the machine you're working on. What does: git --version report? What type of machine are you installing bcbio on? If it's possible to update git to a more recent version hopefully that'll resolve the issue. Sorry about the problems and hope this fixes it for you.

YiChingTang commented 9 years ago

which version do you suggest? here's the info about our machine and git version.

[yiching@work ~]$ git --version git version 1.7.1 [yiching@work ~]$ cat /proc/version Linux version 2.6.32-431.3.1.el6.x86_64 (mockbuild@c6b10.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Jan 3 21:39:27 UTC 2014 [yiching@work ~]$ lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.6 (Final) Release: 6.6 Codename: Final

chapmanb commented 9 years ago

Yiching; Thanks for the additional details, this helps a lot. We'd like to support the default on CentOS 6.6 so I dug into the issue more. It looks like a recent change in Homebrew to try and stash and pop any local changes. To avoid using the flag, I pushed fixes to CloudBioLinux that avoid it. If you remove any cached cloudbiolinux (rm -rf tmpbcbio-install) and re-run, hopefully it'll work cleanly now. Sorry about the issues and hope this fixes it for you.

YiChingTang commented 9 years ago

by re-run, do you mean bcbio_nextgen.py upgrade -u development ? (apologize I've got lost here...) if so, I got some warnings and an aborting failure:

Warning: local() encountered an error (return code 1) while executing 'rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm'

Warning: local() encountered an error (return code 1) while executing 'rpm -Uvh http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm'

Warning: local() encountered an error (return code 1) while executing 'git diff --quiet'

error: unknown option `short' usage: git symbolic-ref [options] name [ref]

-q, --quiet           be quiet
-m <reason>           reason of the update

Error: Failure while executing: git symbolic-ref --short HEAD

Fatal error: local() encountered an error (return code 1) while executing '/opt/bcbio/tools/bin/brew update'

Aborting.

chapmanb commented 9 years ago

Yiching; Sorry about the additional issues. It looks like this is a similar issue to before. The native version of git on CentOS is too old and doesn't have the options that homebrew expects. 1.7.1 is 5 years old at this point. Practically you can try to install a newer version with homebrew inside bcbio:

brew install git --env=inherit --ignore-dependences'

If you then have the bcbio tool directory on your PATH it'll pick up this version first. I also pushed fixes that will skip the update if it fails and keep going. So hopefully re-running one more time will fix it:

rm -rf tmpbcbio-install
bcbio_nextgen.py upgrade --tools

Hope this fixes everything for you and sorry again about all the build trouble.