Closed mshadbolt closed 5 years ago
Hi,
Sorry about this. I updated mirtop without updating the code in bcbio but it should be fixed in the last bcbio devel.
Thanks for trying.
On Apr 25, 2018, at 2:17 PM, Marion notifications@github.com wrote:
Hi I'm trying to run the bcbio smallrnaseq pipeline with mirge and seqbuster but run into the following error running the dev version
[2018-04-25T17:01Z] ['gff', '--sps', 'hsa', '--hairpin', '~/software/bcbio-nextgen/data/genomes/Hsapiens/hg38-noalt/srnaseq/hairpin.fa', '--gtf', '~/software/bcbio-nextgen/data/genomes/Hsapiens/hg38-noalt/srnaseq/mirbase.gff3', '--format', 'seqbuster', '-o', '~/work/bcbiotx/tmp0UFt0o', '~/work/mirbase/m05898_s_1_CGTGAT_trimmed/m05898_s_1_CGTGAT_trimmed.mirna'] Traceback (most recent call last): File "~/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 241, in
main(kwargs) File "~/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 46, in main run_main(kwargs) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 43, in run_main fc_dir, run_info_yaml) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel for xs in pipeline(config, run_info_yaml, parallel, dirs, samples): File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 323, in smallrnaseqpipeline samples = run_parallel("srna_annotation", samples) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items): File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 779, in call while self.dispatch_one_batch(iterator): File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 625, in dispatch_one_batch self._dispatch(tasks) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 588, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 111, in apply_async result = ImmediateResult(func) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 332, in init self.results = batch() File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 131, in call return [func(*args, *kwargs) for func, args, kwargs in self.items] File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 52, in wrapper return apply(f, args, *kwargs) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 151, in srna_annotation return srna.sample_annotation(args) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/sample.py", line 123, in sample_annotation data['config']) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/sample.py", line 261, in _mirtop os.path.join(out_dir, out_fn)) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/shutil.py", line 316, in move copy2(src, real_dst) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/shutil.py", line 144, in copy2 copyfile(src, dst) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/shutil.py", line 96, in copyfile with open(src, 'rb') as fsrc: IOError: [Errno 2] No such file or directory: '~/work/bcbiotx/tmp0UFt0o/m05898_s_1_CGTGAT_trimmed.gff' My yaml config looks like this: resources:
default options, used if other items below are not present
avoids needing to configure/adjust for every program
default: memory: 3.6G # 3.6*32 ~= 115G cores: 32 jvm_opts: ["-Xms800m", "-Xmx3600m"] gatk: jvm_opts: ["-Xms800m", "-Xmx3600m"] snpeff: jvm_opts: ["-Xms800m", "-Xmx3600m"] qualimap: memory: 4g express: memory: 8g dexseq: memory: 10g macs2: memory: 8g seqcluster: memory: 8g mirge: options: ["-lib ~/software/miRge2/miRge.Libs"] details:
- analysis: smallRNA-seq algorithm: trim_reads: false aligner: star expression_caller: [seqbuster, mirge] species: hsa genome_build: hg38-noalt upload: dir: ../final I then tried with the stable version to see if it did the same and I get the exact same error.
Let me know if you need any further info or want me to try anything else
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bcbio/bcbio-nextgen/issues/2379, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HHOylbAe9BfSoV80K2jV9Od8cSYoks5tsL29gaJpZM4Tj6yX.
great thanks, I will try it out and let you know how I go.
Ok I think it got a bit further but now it is complaining about not finding libs
[2018-04-25T22:27Z] ['gff', '--sps', 'hsa', '--hairpin', '~/software/bcbio-nextgen/data/genomes/Hsapiens/hg38-noalt/srnaseq/hairpin.fa', '--gtf', '~/software/bcbio-nextgen/data/genomes/Hsapiens/hg38-noalt/srnaseq/mirbase.gff3', '--format', 'seqbuster', '-o', '~/work/bcbiotx/tmptDj1z0', '~/work/mirbase/m06174_s_2_CGTGAT_trimmed/m06174_s_2_CGTGAT_trimmed.mirna']
[2018-04-25T22:27Z] Looking for mirdeep2 database for m06174_s_2_CGTGAT_trimmed
[2018-04-25T22:27Z] Resource requests: seqcluster; memory: 8.00; cores: 1
[2018-04-25T22:27Z] Configuring 1 jobs to run, using 1 cores each with 8.00g of memory reserved for each job
[2018-04-25T22:27Z] Timing: cluster
[2018-04-25T22:27Z] multiprocessing: seqcluster_cluster
Traceback (most recent call last):
File "~/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 241, in <module>
main(**kwargs)
File "~/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 46, in main
run_main(**kwargs)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 45, in run_main
fc_dir, run_info_yaml)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 89, in _run_toplevel
for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 336, in smallrnaseqpipeline
samples = run_parallel("seqcluster_cluster", [samples])
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
return run_multicore(fn, items, config, parallel=parallel)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items):
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 779, in __call__
while self.dispatch_one_batch(iterator):
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 625, in dispatch_one_batch
self._dispatch(tasks)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 588, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 111, in apply_async
result = ImmediateResult(func)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 332, in __init__
self.results = batch()
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 52, in wrapper
return apply(f, *args, **kwargs)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 159, in seqcluster_cluster
return seqcluster.run_cluster(*args)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/group.py", line 102, in run_cluster
sample["mirge"] = mirge.run(data)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/mirge.py", line 25, in run
lib = _find_lib(sample)
File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/mirge.py", line 73, in _find_lib
if not libs:
NameError: global name 'libs' is not defined
Hi,
I pushed a fix today for this, but the short story is that you need to set up the lib parameter that is plugin into mirge.
Right now, mirge has to be installed manually, still working on this, it will take a while because it has very restricted dependency versions so I am trying to update that in the source package.
As well, you need to download the lib library and set it up as explained here:
https://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#smallrna-seq https://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#smallrna-seq
Let me know if you find more issues.
Thanks
On Apr 25, 2018, at 6:56 PM, Marion notifications@github.com wrote:
Ok I think it got a bit further but now it is complaining about not finding libs
[2018-04-25T22:27Z] ['gff', '--sps', 'hsa', '--hairpin', '~/software/bcbio-nextgen/data/genomes/Hsapiens/hg38-noalt/srnaseq/hairpin.fa', '--gtf', '~/software/bcbio-nextgen/data/genomes/Hsapiens/hg38-noalt/srnaseq/mirbase.gff3', '--format', 'seqbuster', '-o', '~/work/bcbiotx/tmptDj1z0', '~/work/mirbase/m06174_s_2_CGTGAT_trimmed/m06174_s_2_CGTGAT_trimmed.mirna'] [2018-04-25T22:27Z] Looking for mirdeep2 database for m06174_s_2_CGTGAT_trimmed [2018-04-25T22:27Z] Resource requests: seqcluster; memory: 8.00; cores: 1 [2018-04-25T22:27Z] Configuring 1 jobs to run, using 1 cores each with 8.00g of memory reserved for each job [2018-04-25T22:27Z] Timing: cluster [2018-04-25T22:27Z] multiprocessing: seqcluster_cluster Traceback (most recent call last): File "~/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 241, in
main(kwargs) File "~/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 46, in main run_main(kwargs) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 45, in run_main fc_dir, run_info_yaml) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 89, in _run_toplevel for xs in pipeline(config, run_info_yaml, parallel, dirs, samples): File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 336, in smallrnaseqpipeline samples = run_parallel("seqcluster_cluster", [samples]) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items): File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 779, in call while self.dispatch_one_batch(iterator): File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 625, in dispatch_one_batch self._dispatch(tasks) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 588, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 111, in apply_async result = ImmediateResult(func) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 332, in init self.results = batch() File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 131, in call return [func(*args, *kwargs) for func, args, kwargs in self.items] File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 52, in wrapper return apply(f, args, *kwargs) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 159, in seqcluster_cluster return seqcluster.run_cluster(args) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/group.py", line 102, in run_cluster sample["mirge"] = mirge.run(data) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/mirge.py", line 25, in run lib = _find_lib(sample) File "~/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/mirge.py", line 73, in _find_lib if not libs: NameError: global name 'libs' is not defined — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bcbio/bcbio-nextgen/issues/2379#issuecomment-384459452, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HPSWHPRqsDCij9bP0a1i151DigC4ks5tsP8ugaJpZM4Tj6yX.
Hi Lorena
Yes I realised later that I hadn't set the -lib
path correctly in my yaml file. I then ran into issues with installing miRge as initially I installed locally but usually need to unset my local library to get bcbio to run. I was able to work around it by creating a virtualenv with mirge and its package dependencies installed then running bcbio within that. I managed to get mirge to run successfully BUT it only ran on one of my samples instead of all 3 that I had included in the .yaml file.
Perhaps the problem is in the file at ~/work/mirge/sample_file.txt
? This file only contains the path to one of my samples but I believe it should be how you specify multiple samples to mirge?
It ran the other parts of the pipeline, STAR and seqbuster, on all three samples.
Thanks again for your help :)
Hi,
oh, sorry. Yes you are right. There was a bug there. I just pushed a fix for that (hopefully).
Sorry for the painful of installing miRge, I am trying to get it fix that as well.
thanks!
Awesome thanks, I am running now on a bigger set of samples so will let you know if I run into any more issues.
Hi again, not sure if you can help with this one or if it is more a problem with miRge but I found that I ran out of memory when trying to run miRge on a large number of samples. I have set up the bcbio system resource config settings so that they stay within my system's requirements but perhaps because miRge is installed separately it doesn't pay attention to these settings? I couldn't find any settings in their documentation to control memory usage. I guess because it processes everything as one batch and doesn't save any intermediary files to disk and holds them all in memory instead.
FYI I was trying to run it with 372 samples but it failed when it reached 100. I was trying to keep it within 64GB as its a shared system, but in total we have 128GB.
I might try just running in batches with miRge as standalone, or maybe one by one then merge together at the end. Might not be an issue for people who have lots of memory or not many samples but thought it would be good to let you know so you're aware of its memory-hogging ways ;)
Hi,
The only thing I can do is to add mirge to the tools you can set up memory. If you were running in local
mode with 64GB already, then I cannot do much more and I would suggest just to open an issue for them. Maybe there are interesting in debugging this.
I pushed the fix and now if you setup mirge
in the bcbio_symtem.yaml file with the memory you want, it should allocate that to that specific job, right now it was 8G following seqcluster recommendation when running in ipython
mode.
I will try to reproduce as well, I think I have one project with similar number of samples.
Thanks!
Thanks, looks like Lorena fixed this issue and there hasn't been any action, so closing.
Hi I'm trying to run the bcbio smallrnaseq pipeline with mirge and seqbuster but run into the following error running the dev version
My yaml config looks like this:
I then tried with the stable version to see if it did the same and I get the exact same error.
Let me know if you need any further info or want me to try anything else