bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
985 stars 353 forks source link

Multiprocessing exception with v0.9.9 #1480

Closed duxan closed 7 years ago

duxan commented 8 years ago

Hi,

I ran same pipeline that worked previously with new BCbio version and got "multiprocessing error". Log file is job.err.log.txt and bcbio is run with -t local. Any idea why this happened?

Thanks!

chapmanb commented 8 years ago

Dušan; Sorry about the issue, this particular traceback doesn't have the causative error so it's hard for me to say much other than it died during variant calling. It can often be tricky to debug multi-threaded errors. If you re-run with a single core (-n 1) does it still error out and provide a more useful traceback? Thanks for the work debugging.

lpantano commented 7 years ago

Hi

I am closing this because it seems an old issue. Come back if you find other issues or want to continue with this one.

cheers

phu5ion commented 7 years ago

Hi,

I have met with the same error. For me it fails at multiple steps: bwa-mem, and I think sambamba or GATK. I've noticed this error only occurs when I submit a job through the scheduler, and everything works perfectly when run on local machine. When running on one core to diagnose the issue, my traceback is: Traceback (most recent call last): File "/mnt/projects/dlho/tancrc/bcbio_pipeline/bin/bcbio_nextgen.py", line 234, in main(kwargs) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/bin/bcbio_nextgen.py", line 43, in main run_main(kwargs) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 42, in run_main fc_dir, run_info_yaml) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 86, in _run_toplevel for xs in pipeline(config, run_info_yaml, parallel, dirs, samples): File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 133, in variant2pipeline samples = run_parallel("postprocess_alignment", samples) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items): File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 800, in call while self.dispatch_one_batch(iterator): File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch self._dispatch(tasks) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 566, in _dispatch job = ImmediateComputeBatch(batch) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 180, in init self.results = batch() File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in call return [func(*args, *kwargs) for func, args, kwargs in self.items] File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper return apply(f, args, kwargs) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 92, in postprocess_alignment return sample.postprocess_alignment(args) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/pipeline/sample.py", line 233, in postprocess_alignment data = coverage.assign_interval(data) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/bcbio/variation/coverage.py", line 40, in assign_interval callable_size = pybedtools.BedTool(vrs).total_coverage() File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/pybedtools/bedtool.py", line 2985, in total_coverage b = self.merge() File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/pybedtools/bedtool.py", line 775, in decorated result = method(self, args, kwargs) File "/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/lib/python2.7/site-packages/pybedtools/bedtool.py", line 204, in not_implemented_func raise NotImplementedError(help_str) NotImplementedError: "mergeBed" does not appear to be installed or on the path, so this method is disabled. Please install a more recent version of BEDTools and re-import to use this method.

I believe that running jobs on scheduler may result in bcbio unable to find some programs. Is there anyway to get past this issue? Thanks!

chapmanb commented 7 years ago

The error looks like you don't have the installed bcbio tools in your PATH for your job submission script. If you're writing a script to submit the main bcbio_nextgen.py runner job, you should add

export PATH=/path/to/bcbio/local/bin:$PATH

at the top to ensure that gets passed through to the engines and controllers. Hope this helps.

phu5ion commented 7 years ago

Hi Brad,

Thank you so much!