Closed mortunco closed 8 years ago
Sorry about the issues. As you suspect, I believe the problem is that you don't have enough space available on the root filesystem where Docker puts containers. The bcbio container is ~6Gb unpacked and you have ~2Gb, so my guess is the error is from that and then Docker rolls back the partially downloaded image to clean up space so if you look afterwards there appears to be some there.
If you start your Amazon instance with additional space on the root volume this should hopefully fix the issue. The bcbio AWS integration uses 20Gb for this:
https://github.com/chapmanb/bcbio-nextgen-vm/blob/master/elasticluster/config#L41
Hope this helps.
Dear Brad,
I started a new instance which has 30 Gb storage. However this time I got an error about java getting out of memory. Is there a way to allocate a memory ? I made the suspicious text to bold so that you may tell me if I am wrong. Could you prefer me a instance hardware so that I can start building that one ?
Thank you for your help,
Best,
Tunc.
ubuntu@ip-172-31-50-19:~$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
8387d9ff0016: Already exists
3b52deaaf0ed: Already exists
4bd501fad6de: Already exists
a3ed95caeb02: Already exists
150b376cadde: Already exists
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
Stopping docker container
Traceback (most recent call last):
File "/usr/local/bin/bcbio_vm.py", line 4, in <module>
__import__('pkg_resources').run_script('bcbio-nextgen-vm==0.1.0a0', 'bcbio_vm.py')
File "/home/ubuntu/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.2.2-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script
File "/home/ubuntu/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.2.2-py2.7.egg/pkg_resources/__init__.py", line 1491, in run_script
File "/home/ubuntu/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 307, in <module>
File "/home/ubuntu/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 36, in cmd_install
File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 40, in full
File "build/bdist.linux-x86_64/egg/bcbiovm/docker/manage.py", line 47, in run_bcbio_cmd
subprocess.CalledProcessError: Command 'docker attach --no-stdin 77262f8ce6cb13361ed9ed71d662027d23e6155c16d0c67a4e93779768cbd614
3650K .......... .......... .......... .......... .......... 73% 34.7M 0s
3700K .......... .......... .......... .......... .......... 74% 46.5M 0s
3750K .......... .......... .......... .......... .......... 75% 28.2M 0s
3800K .......... .......... .......... .......... .......... 76% 49.1M 0s
3850K .......... .......... .......... .......... .......... 77% 28.0M 0s
3900K .......... .......... .......... .......... .......... 78% 28.5M 0s
3950K .......... .......... .......... .......... .......... 79% 134M 0s
4000K .......... .......... .......... .......... .......... 80% 27.6M 0s
4050K .......... .......... .......... .......... .......... 81% 41.2M 0s
4100K .......... .......... .......... .......... .......... 82% 29.0M 0s
4150K .......... .......... .......... .......... .......... 83% 44.7M 0s
4200K .......... .......... .......... .......... .......... 84% 30.7M 0s
4250K .......... .......... .......... .......... .......... 85% 41.8M 0s
4300K .......... .......... .......... .......... .......... 86% 26.0M 0s
4350K .......... .......... .......... .......... .......... 87% 31.7M 0s
4400K .......... .......... .......... .......... .......... 88% 38.1M 0s
4450K .......... .......... .......... .......... .......... 89% 39.7M 0s
4500K .......... .......... .......... .......... .......... 90% 30.4M 0s
4550K .......... .......... .......... .......... .......... 91% 29.8M 0s
4600K .......... .......... .......... .......... .......... 92% 47.3M 0s
4650K .......... .......... .......... .......... .......... 93% 31.8M 0s
4700K .......... .......... .......... .......... .......... 94% 40.5M 0s
4750K .......... .......... .......... .......... .......... 95% 29.7M 0s
4800K .......... .......... .......... .......... .......... 96% 30.9M 0s
4850K .......... .......... .......... .......... .......... 97% 50.5M 0s
4900K .......... .......... .......... .......... .......... 98% 51.3M 0s
4950K .......... .......... .......... .......... .......... 99% 7.72M 0s
5000K .......... .......... ......... 100% 49.3K=0.8s
2016-03-17 17:14:09 (6.25 MB/s) - written to stdout [5150442/5150442]
INFO: <cloudbio.flavor.Flavor instance at 0x7f5c819a0c68>
INFO: This is a ngs_pipeline_minimal flavor
INFO: Reading default fabricrc.txt
DBG [config.py]: Using config file /home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/../config/fabricrc.txt
INFO: Distribution __auto__
INFO: Get local environment
INFO: Ubuntu setup
DBG [distribution.py]: Debian-shared setup
DBG [distribution.py]: Source=trusty
DBG [distribution.py]: NixPkgs: Ignored
INFO: Now, testing connection to host...
INFO: Connection to host appears to work!
DBG [utils.py]: Expand paths
INFO: List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'annotations': ['GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', 'cosmic', 'ancestral', 'qsignature', 'transcripts', 'RADAR', 'mirbase'], 'validation': ['giab-NA12878', 'dream-syn3', 'dream-syn4'], 'name': 'Human (GRCh37)', 'dbkey': 'GRCh37'}], 'install_uniref': False}'): Human (GRCh37)
INFO: Genome preparation method ggd failed, trying next
INFO: Downloading genome from s3: GRCh37 rtg
--2016-03-17 17:14:10-- https://s3.amazonaws.com/biodata/genomes/GRCh37-rtg.tar.xz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.81.236
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.81.236|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-03-17 17:14:10 ERROR 404: Not Found.
Warning: local() encountered an error (return code 8) while executing 'wget --continue --no-check-certificate -O GRCh37-rtg.tar.xz 'https://s3.amazonaws.com/biodata/genomes/GRCh37-rtg.tar.xz''
**INFO: Genome preparation method s3 failed, trying next
INFO: Preparing genome GRCh37 with index rtg
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007a04a0000, 715849728, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 715849728 bytes for committing reserved memory.
# An error report file with more information is saved as:
# ./hs_err_pid114.log
The operating system did not make requested memory available to the JVM. Try removing other jobs on this machine, adjusting allocated memory appropriate to currently available memory, or adjusting command parameters to reduce memory requirements. More information is contained in the file: **./hs_err_pid114.log
Upgrading bcbio-nextgen data files
Setting up virtual machine
[localhost] local: cat /etc/*release | grep DISTRIB_CODENAME | cut -f 2 -d =
[localhost] local: echo $HOME
[localhost] local: uname -m
[localhost] local: pwd
[localhost] local: echo $HOME
[localhost] local: mkdir -p '/home/ubuntu/tmp/cloudbiolinux/84b60da4-dfde-36aa-801b-7011d94bae4c'
[localhost] local: wget --continue --no-check-certificate -O GRCh37-rtg.tar.xz 'https://s3.amazonaws.com/biodata/genomes/GRCh37-rtg.tar.xz'
[localhost] local: export RTG_JAVA_OPTS='-Xms1g' export RTG_MEM=2g && rtg format -o rtg/GRCh37.sdf /usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa
Fatal error: local() encountered an error (return code 1) while executing 'export RTG_JAVA_OPTS='-Xms1g' export RTG_MEM=2g && rtg format -o rtg/GRCh37.sdf /usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa'
Aborting.
INFO: Genome preparation method raw failed, trying next
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 207, in <module>
install.upgrade_bcbio(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 91, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 267, in upgrade_bcbio_data
cbl_deploy.deploy(s)
File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
_setup_vm(options, vm_launcher, actions)
File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
configure_instance(options, actions)
File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
setup_biodata(options)
File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
install_proc(options["genomes"], ["ggd", "s3", "raw"])
File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
_prep_genomes(env, genomes, genome_indexes, ready_approaches)
File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 486, in _prep_genomes
raise IOError("Could not prepare index {0} for {1} by any method".format(idx, gid))
IOError: Could not prepare index rtg for GRCh37 by any method
' returned non-zero exit status 1
ubuntu@ip-172-31-50-19:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 503212 12 503200 1% /dev
tmpfs 101636 340 101296 1% /run
/dev/xvda1 30822592 17702576 11762416 61% /
none 4 0 4 0% /sys/fs/cgroup
none 5120 0 5120 0% /run/lock
none 508160 0 508160 0% /run/shm
none 102400 0 102400 0% /run/user
/dev/xvdf 515930552 296782596 192917172 61% /home/ubuntu/dendi
Sorry about the memory issues -- what type of instance type are you running with? There is some index building and other work in the preparation, so you probably want to go with t2.large or better to have a reasonable amount of compute and memory. The very small images don't have much memory so might be causing the issue here. Hope this helps.
I am running an instance of t2.micro :) I am aware that It is small btw. I am using it because it is free tho.
Edit: I initiated a m4.large instance. I will report you if its work.
Dear @chapmanb ,
Bad news, apparently I faced with another memory problem. I also posted the df command to show you my root memory. It is 30 GB. however, even that did not merit. Also, since I am only going to use bcbio for calling mutations of tumor/normal samples I think I dont need to download other RNAseq stuff. Also I dont think I need aligners too. As I understood from the tutorial I can to --datatarget variation command to make my bcbio_vm.py installation more specific. However, when I removed --aligners from the command line it, it gave an error such as below;
[ec2-user@ip-172-31-62-129 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools --genomes GRCh37
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
Data not installed, no aligners provided with `--aligners` flag
This is my systems df.
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 30830568 30698748 31572 100% /
devtmpfs 4080360 100 4080260 1% /dev
tmpfs 4089368 0 4089368 0% /dev/shm
/dev/xvdf 515930552 296782596 192917172 61% /home/ec2-user/dendi
This is the error that I got while following regular installation.
[ec2-user@ip-172-31-62-129 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools \
> --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
Stopping docker container
Traceback (most recent call last):
File "/usr/local/bin/bcbio_vm.py", line 4, in <module>
__import__('pkg_resources').run_script('bcbio-nextgen-vm==0.1.0a0', 'bcbio_vm.py')
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1491, in run_script
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 307, in <module>
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 36, in cmd_install
File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 40, in full
File "build/bdist.linux-x86_64/egg/bcbiovm/docker/manage.py", line 47, in run_bcbio_cmd
subprocess.CalledProcessError: Command 'docker attach --no-stdin f98cba899c2ff1293cc9c7ca1915c778c39ed8fc944a807a55cd8d630862a810
3200K .......... .......... .......... .......... .......... 64% 43.7M 0s
3250K .......... .......... .......... .......... .......... 65% 39.6M 0s
3300K .......... .......... .......... .......... .......... 66% 77.8M 0s
3350K .......... .......... .......... .......... .......... 67% 77.6M 0s
3400K .......... .......... .......... .......... .......... 68% 47.2M 0s
3450K .......... .......... .......... .......... .......... 69% 32.5M 0s
3500K .......... .......... .......... .......... .......... 70% 19.2M 0s
3550K .......... .......... .......... .......... .......... 71% 127M 0s
3600K .......... .......... .......... .......... .......... 72% 91.6M 0s
3650K .......... .......... .......... .......... .......... 73% 82.3M 0s
3700K .......... .......... .......... .......... .......... 74% 74.6M 0s
3750K .......... .......... .......... .......... .......... 75% 52.8M 0s
3800K .......... .......... .......... .......... .......... 76% 62.4M 0s
3850K .......... .......... .......... .......... .......... 77% 45.8M 0s
3900K .......... .......... .......... .......... .......... 78% 22.6M 0s
3950K .......... .......... .......... .......... .......... 79% 32.3M 0s
4000K .......... .......... .......... .......... .......... 80% 29.9M 0s
4050K .......... .......... .......... .......... .......... 81% 30.2M 0s
4100K .......... .......... .......... .......... .......... 82% 81.9M 0s
4150K .......... .......... .......... .......... .......... 83% 45.7M 0s
4200K .......... .......... .......... .......... .......... 84% 73.0M 0s
4250K .......... .......... .......... .......... .......... 85% 47.5M 0s
4300K .......... .......... .......... .......... .......... 86% 68.3M 0s
4350K .......... .......... .......... .......... .......... 87% 52.0M 0s
4400K .......... .......... .......... .......... .......... 88% 65.3M 0s
4450K .......... .......... .......... .......... .......... 89% 43.1M 0s
4500K .......... .......... .......... .......... .......... 90% 61.4M 0s
4550K .......... .......... .......... .......... .......... 91% 54.5M 0s
4600K .......... .......... .......... .......... .......... 92% 60.9M 0s
4650K .......... .......... .......... .......... .......... 93% 47.6M 0s
4700K .......... .......... .......... .......... .......... 94% 57.0M 0s
4750K .......... .......... .......... .......... .......... 95% 72.0M 0s
4800K .......... .......... .......... .......... .......... 96% 36.8M 0s
4850K .......... .......... .......... .......... .......... 97% 23.9M 0s
4900K .......... .......... .......... .......... .......... 98% 14.6M 0s
4950K .......... .......... .......... .......... .......... 99% 18.5M 0s
5000K .......... .......... ......... 100% 9.74M=0.2s
2016-03-18 09:30:17 (28.9 MB/s) - written to stdout [5150448/5150448]
INFO: <cloudbio.flavor.Flavor instance at 0x7f19a7af0998>
INFO: This is a ngs_pipeline_minimal flavor
INFO: Reading default fabricrc.txt
DBG [config.py]: Using config file /home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/../config/fabricrc.txt
INFO: Distribution __auto__
INFO: Get local environment
INFO: Ubuntu setup
DBG [distribution.py]: Debian-shared setup
DBG [distribution.py]: Source=trusty
DBG [distribution.py]: NixPkgs: Ignored
INFO: Now, testing connection to host...
INFO: Connection to host appears to work!
DBG [utils.py]: Expand paths
INFO: List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'annotations': ['GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', 'cosmic', 'ancestral', 'qsignature', 'transcripts', 'RADAR', 'mirbase'], 'validation': ['giab-NA12878', 'dream-syn3', 'dream-syn4'], 'name': 'Human (GRCh37)', 'dbkey': 'GRCh37'}], 'install_uniref': False}'): Human (GRCh37)
--2016-03-18 09:30:17-- https://s3.amazonaws.com/biodata/annotation/GRCh37-rnaseq-2015-12-01.tar.xz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.17.232
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.17.232|:443... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable
The file is already fully retrieved; nothing to do.
tar: GRCh37/rnaseq-2015-12-01/tophat/GRCh37_transcriptome.4.bt2: Wrote only 8192 of 10240 bytes
tar: GRCh37/rnaseq-2015-12-01/kallisto/GRCh37: Wrote only 4096 of 10240 bytes
tar: GRCh37/rnaseq-2015-12-01/ref-transcripts-splicesites.txt: Cannot write: No space left on device
tar: GRCh37/rnaseq-2015-12-01/ref-transcripts-mask.gtf: Cannot write: No space left on device
tar: Exiting with failure status due to previous errors
Upgrading bcbio-nextgen data files
Setting up virtual machine
[localhost] local: cat /etc/*release | grep DISTRIB_CODENAME | cut -f 2 -d =
[localhost] local: echo $HOME
[localhost] local: uname -m
Running GGD recipe: transcripts
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 207, in <module>
install.upgrade_bcbio(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 91, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 267, in upgrade_bcbio_data
cbl_deploy.deploy(s)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
_setup_vm(options, vm_launcher, actions)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
configure_instance(options, actions)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
setup_biodata(options)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
install_proc(options["genomes"], ["ggd", "s3", "raw"])
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
_prep_genomes(env, genomes, genome_indexes, ready_approaches)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 474, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 796, in _install_with_ggd
ggd.install_recipe(env.cwd, recipe_file)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
recipe["recipe"]["full"]["recipe_type"])
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
subprocess.check_output(["bash", run_file])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '/usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 2
' returned non-zero exit status 1
To sum up, I will try with a 50 Gb space service and see if that one works. I am in a doubt of if there is a problem which may cause a recursive installation of a specific file that fill up the memory? Can this happen ?
I really need to get this program working.
Thank you for your help,
Best,
Tunc.
Dear @chapman,
I tried installation with 50 Gb of root space and this is the current output. This time I did not get an error about space. The error was about a file 1000G_phase1.snps.high_confidence.vcf.gz
you may see the the error below.
Thank you for your help,
Tunc.
[ec2-user@ip-172-31-59-88 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools \
> --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
Stopping docker container
Traceback (most recent call last):
File "/usr/local/bin/bcbio_vm.py", line 4, in <module>
__import__('pkg_resources').run_script('bcbio-nextgen-vm==0.1.0a0', 'bcbio_vm.py')
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1491, in run_script
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 307, in <module>
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 36, in cmd_install
File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 40, in full
File "build/bdist.linux-x86_64/egg/bcbiovm/docker/manage.py", line 47, in run_bcbio_cmd
subprocess.CalledProcessError: Command 'docker attach --no-stdin 6fccb7b5b19747613ad7cbbad234c95b567139cf8a3aa0500f77f747e1c6f83a
1753150K .......... .......... .......... .......... .......... 99% 306K 2s
1753200K .......... .......... .......... .......... .......... 99% 3.61M 2s
1753250K .......... .......... .......... .......... .......... 99% 2.77M 2s
1753300K .......... .......... .......... .......... .......... 99% 2.75M 2s
1753350K .......... .......... .......... .......... .......... 99% 4.73M 2s
1753400K .......... .......... .......... .......... .......... 99% 1.04M 2s
1753450K .......... .......... .......... .......... .......... 99% 1.69M 2s
1753500K .......... .......... .......... .......... .......... 99% 1.70M 2s
1753550K .......... .......... .......... .......... .......... 99% 1.72M 1s
1753600K .......... .......... .......... .......... .......... 99% 1.71M 1s
1753650K .......... .......... .......... .......... .......... 99% 1.66M 1s
1753700K .......... .......... .......... .......... .......... 99% 1.69M 1s
1753750K .......... .......... .......... .......... .......... 99% 1.69M 1s
1753800K .......... .......... .......... .......... .......... 99% 1.62M 1s
1753850K .......... .......... .......... .......... .......... 99% 1.73M 1s
1753900K .......... .......... .......... .......... .......... 99% 1.68M 1s
1753950K .......... .......... .......... .......... .......... 99% 1.71M 1s
1754000K .......... .......... .......... .......... .......... 99% 1.68M 1s
1754050K .......... .......... .......... .......... .......... 99% 1.73M 1s
1754100K .......... .......... .......... .......... .......... 99% 1.69M 1s
1754150K .......... .......... .......... .......... .......... 99% 1.72M 1s
1754200K .......... .......... .......... .......... .......... 99% 1.68M 1s
1754250K .......... .......... .......... .......... .......... 99% 1.69M 1s
1754300K .......... .......... .......... .......... .......... 99% 1.67M 1s
1754350K .......... .......... .......... .......... .......... 99% 1.72M 1s
1754400K .......... .......... .......... .......... .......... 99% 1.69M 1s
1754450K .......... .......... .......... .......... .......... 99% 1.70M 1s
1754500K .......... .......... .......... .......... .......... 99% 1.70M 1s
1754550K .......... .......... .......... .......... .......... 99% 1.68M 1s
1754600K .......... .......... .......... .......... .......... 99% 1.71M 1s
1754650K .......... .......... .......... .......... .......... 99% 1.72M 1s
1754700K .......... .......... .......... .......... .......... 99% 1.66M 1s
1754750K .......... .......... .......... .......... .......... 99% 1.75M 1s
1754800K .......... .......... .......... .......... .......... 99% 1.68M 1s
1754850K .......... .......... .......... .......... .......... 99% 1.69M 1s
1754900K .......... .......... .......... .......... .......... 99% 1.75M 1s
1754950K .......... .......... .......... .......... .......... 99% 1.70M 1s
1755000K .......... .......... .......... .......... .......... 99% 1.65M 1s
1755050K .......... .......... .......... .......... .......... 99% 1.75M 1s
1755100K .......... .......... .......... .......... .......... 99% 1.71M 1s
1755150K .......... .......... .......... .......... .......... 99% 1.67M 1s
1755200K .......... .......... .......... .......... .......... 99% 20.0M 1s
1755250K .......... .......... .......... .......... .......... 99% 1.71M 0s
1755300K .......... .......... .......... .......... .......... 99% 1.76M 0s
1755350K .......... .......... .......... .......... .......... 99% 1.64M 0s
1755400K .......... .......... .......... .......... .......... 99% 1.71M 0s
1755450K .......... .......... .......... .......... .......... 99% 1.77M 0s
1755500K .......... .......... .......... .......... .......... 99% 1.64M 0s
1755550K .......... .......... .......... .......... .......... 99% 1.76M 0s
1755600K .......... .......... .......... .......... .......... 99% 1.71M 0s
1755650K .......... .......... .......... .......... .......... 99% 1.64M 0s
1755700K .......... .......... .......... .......... .......... 99% 1.71M 0s
1755750K .......... .......... .......... .......... .......... 99% 1.70M 0s
1755800K .......... .......... .......... .......... .......... 99% 1.67M 0s
1755850K .......... .......... .......... .......... .......... 99% 1.74M 0s
1755900K .......... .......... .......... .......... .......... 99% 37.5M 0s
1755950K .......... .......... .......... .......... .......... 99% 1.75M 0s
1756000K .......... .......... .......... .......... .......... 99% 1.67M 0s
1756050K .......... .......... .......... .......... .......... 99% 1.73M 0s
1756100K .......... .......... .......... .......... .......... 99% 1.71M 0s
1756150K ..... 100% 10414G=16m52s
2016-03-18 12:14:29 (1.69 MB/s) - written to stdout [1798303191]
Not a BGZF file: 1000G_phase1.snps.high_confidence.vcf.gz
tbx_index_build failed: 1000G_phase1.snps.high_confidence.vcf.gz
Upgrading bcbio-nextgen data files
Setting up virtual machine
[localhost] local: cat /etc/*release | grep DISTRIB_CODENAME | cut -f 2 -d =
[localhost] local: echo $HOME
[localhost] local: uname -m
Running GGD recipe: 1000g_snps
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 207, in <module>
install.upgrade_bcbio(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 91, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 267, in upgrade_bcbio_data
cbl_deploy.deploy(s)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
_setup_vm(options, vm_launcher, actions)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
configure_instance(options, actions)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
setup_biodata(options)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
install_proc(options["genomes"], ["ggd", "s3", "raw"])
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
_prep_genomes(env, genomes, genome_indexes, ready_approaches)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 474, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 796, in _install_with_ggd
ggd.install_recipe(env.cwd, recipe_file)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
recipe["recipe"]["full"]["recipe_type"])
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
subprocess.check_output(["bash", run_file])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '/usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 1
' returned non-zero exit status 1
Tunc;
Sorry about the continued problems. With your latest error if you check df -h
how does disk space usage appear to be? 50Gb total is going to be tight and I'm worried you're getting a truncated file or something else here. The script that prepares this is downloading and piping directly into a bgzipped file so shouldn't get this unless something is going wrong:
https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/GRCh37/1000g_snps.yaml#L13
What type of analysis are you trying to run? I'm worried with small image sizes and trying hard to minimize disk space you might not have enough resources to run your actual analysis. EBS storage is cheap compared to instance costs, so I'd suggest providing a bunch of disk to avoid any issues.
Hope this explains the issue.
Brad;
Tell me any size that will run this command and I will make it come through :) . I will have ~ 230 bam files from tumor and normal patients and I am going to call mutations of them. I won't align them. I only need those variant callers to call my mutations ( Indel and SNV/SNP) prefentially. Thats why I won't need aligners, rnaseq, snv callers, cnv caller etc.
My current instance is m4.large.
[ec2-user@ip-172-31-59-88 cancer-dream-syn3]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 50G 37G 13G 75% /
devtmpfs 3.9G 120K 3.9G 1% /dev
tmpfs 3.9G 48K 3.9G 1% /dev/shm
/dev/xvdf 493G 284G 184G 61% /home/ec2-user/dendi
Right now, installation does not respond me anything. I believe servers are angry to me because I am insisting too much for the data. Anyways, I am showing you the latest output from the command. I stopped it because It has been like this for ~15 minutes. I also checked the df size during the process, it did not change at all.
[ec2-user@ip-172-31-59-88 cancer-dream-syn3]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
^Z
[2]+ Stopped bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools --genomes GRCh37 --aligners bwa
I am ready to do anything just to call these mutations regardless from the size and the amount of Gb storage space. As soon I install them, I might need your help more about running bcbio. I am aware that I am writing too much here, but believe me my PI is about to kill me.
Thank you very much,
Tunc.
Tunc; Sorry about the time pressure. My recommendation would be to step back and think about your resource requirements for running this job. 230 tumor/normal pairs (exome or whole genome?) is going to require multiple machines on AWS and be a significant run time. Depending on how the BAMs were initially aligned and prepared, you might also need to realign to ensure read groups and genome builds are correct.
So, an m4.large instance is not going to get you far in terms of running these. Do you have a local cluster you could run these on? bcbio works cleanly on local infastructure with most schedulers. If you need to run on Amazon, then you'll need a cluster setup there as well and it might be worth looking at using bcbio_vm to bootstrap a cluster with a shared filesystem:
https://bcbio-nextgen.readthedocs.org/en/latest/contents/cloud.html
Sorry to not have a quick answer for you but I do think working through your resource requirements and needs will be a faster solution in the long term.
Dear Brad;
I have whole genome bams. But I am thinking about calling each patient at a time (which are 1 tumour and normal). I would like to start in the most basic sense which is just be able to run one of the examples. After that, I will increase the size of the work one at a time. Therefore, for initialisation I am considering Amazon as one strong computer which can do mutation calling.
So, let me rephrase my question, What should I do for installing bcbio_vm.py correct and at least ready to call a single pair of mutations.(or reproduce a tutorial on documentation). In this context, is m4.large is enough ? or should I increase it ? Right now, my installation literally got frozen.
Thank you for your time,
Best,
Tunc.
Tunc; Sorry about the freezing issues. I'm not sure what would cause that and wouldn't expect bcbio to cause it on an m4.large during installation. For the installation itself this should be sufficient.
Regarding running whole genome BAMs (what coverage do you have?), I haven't tried to estimate the smallest machine to run these so don't have a great guess on runtimes. You'd need at least 16 cores, so would need an m4.4xlarge or better. You'll probably expect a day+ runtime for a single tumor/normal run with these, although it depends on the variant caller you're using. Hope this helps.
My coverage is about 50-70x that means my bams are almost are 150-200 gb each. Right now my biggest problem is installation. Could we focus on that? To identify we are on the same page, I am following the guide here . Because in documentation I couldnt find installing the actual tools other than installing bcbio_vm.py (as a remote for initiating the process). Also to be sure about ram and storage, I initiated a r3.large instance which has 100 GB root space and Amazon Linux AMI 2015.09.2 (HVM), SSD Volume Type on it. Still I have a freezing installation for somehow.
[ec2-user@ip-172-31-57-166 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools \
> --genomes GRCh37 --aligners bwa
/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
bf61d14f65db: Pull complete
3ea15286bc1a: Pull complete
515285067dbf: Pull complete
5e89b839ecfa: Pull complete
8476146c257f: Pull complete
c13d1b0ae7ed: Pull complete
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Downloaded newer image for bcbio/bcbio:latest
Tunc; If your plan is to run on a single machine, I recommend using the standard installation and bcbio_nextgen.py:
https://bcbio-nextgen.readthedocs.org/en/latest/contents/installation.html#automated
This is more well tested and will hopefully work cleaner for you. It does not need Docker and will download and install all the tools locally. Sorry if the bcbio_vm/AWS instructions are confusing and leading you down the wrong path -- that's meant to provide a better experience if you follow those instructions and use it to boot up a cluster.
Sorry I'm not able to help more with your current issue but if you want to keep going down that path, I'm happy to try and help more if you can identify any error messages that might provide a clue.
Dear @chapmanb,
This is my most recent error. I will try that path. But I ask you to please not close this issue in case of somebody will find a solution to it. Meanwhile i will try to use the build inside of the link, could you please try to identify my problem.
Thank you for your help and especially for the patience. Even though we have not solved this problem yet, at least you have not given up on me for helping. If I get work this program, I might need your address to send a champagne !
Thanks,
Tunc!
[ec2-user@ip-172-31-57-166 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools \
> --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
Stopping docker container
Traceback (most recent call last):
File "/usr/local/bin/bcbio_vm.py", line 4, in <module>
__import__('pkg_resources').run_script('bcbio-nextgen-vm==0.1.0a0', 'bcbio_vm.py')
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1491, in run_script
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 307, in <module>
File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 36, in cmd_install
File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 40, in full
File "build/bdist.linux-x86_64/egg/bcbiovm/docker/manage.py", line 47, in run_bcbio_cmd
subprocess.CalledProcessError: Command 'docker attach --no-stdin 9e63bbe9c5ebb0754ba0d8b462f90c1b4199907e2c5bd779edf6e42a805bca6e
2700K .......... .......... .......... .......... .......... 114M
2750K .......... .......... .......... .......... .......... 59.0M
2800K .......... .......... .......... .......... .......... 34.3M
2850K .......... .......... .......... .......... .......... 60.0M
2900K .......... .......... .......... .......... .......... 31.5M
2950K .......... .......... .......... .......... .......... 319M
3000K .......... .......... .......... .......... .......... 55.6M
3050K .......... .......... .......... .......... .......... 33.6M
3100K .......... .......... .......... .......... .......... 52.1M
3150K .......... .......... .......... .......... .......... 47.4M
3200K .......... .......... .......... .......... .......... 53.1M
3250K .......... .......... .......... .......... .......... 78.2M
3300K .......... .......... .......... .......... .......... 40.9M
3350K .......... .......... .......... .......... .......... 45.0M
3400K .......... .......... .......... .......... .......... 49.1M
3450K .......... .......... .......... .......... .......... 68.0M
3500K .......... .......... .......... .......... .......... 64.9M
3550K .......... .......... .......... .......... .......... 30.7M
3600K .......... .......... .......... .......... .......... 64.1M
3650K .......... .......... .......... .......... .......... 38.0M
3700K .......... .......... .......... .......... .......... 40.3M
3750K .......... .......... .......... .......... .......... 37.0M
3800K .......... .......... .......... .......... .......... 71.8M
3850K .......... .......... .......... .......... .......... 59.0M
3900K .......... .......... .......... .......... .......... 49.5M
3950K .......... .......... .......... .......... .......... 46.7M
4000K .......... .......... .......... .......... .......... 51.4M
4050K .......... .......... .......... .......... .......... 11.3M
4100K .......... .......... .......... .......... .......... 44.5M
4150K .......... .......... .......... .......... .......... 30.6M
4200K .......... .......... .......... .......... .......... 41.9M
4250K .......... .......... .......... .......... .......... 84.8M
4300K .......... .......... .......... .......... .......... 101M
4350K .......... .......... .......... .......... .......... 34.0M
4400K .......... .......... .......... .......... .......... 95.3M
4450K .......... .......... .......... .......... .......... 173M
4500K .......... .......... .......... .......... .......... 198M
4550K .......... .......... .......... .......... .......... 29.1M
4600K .......... .......... .......... .......... .......... 83.1M
4650K .......... .......... .......... .......... .......... 35.9M
4700K .......... .......... .......... .......... .......... 80.1M
4750K .......... .......... .......... .......... .......... 39.9M
4800K .......... .......... .......... .......... .......... 86.0M
4850K .......... .......... .......... .......... .......... 56.2M
4900K .......... .......... .......... .......... .......... 36.6M
4950K .......... .......... .......... .......... .......... 26.5M
5000K .......... .......... ......... 12.2M=0.2s
2016-03-18 17:49:56 (30.1 MB/s) - written to stdout [5150448]
INFO: <cloudbio.flavor.Flavor instance at 0x7fec04027fc8>
INFO: This is a ngs_pipeline_minimal flavor
INFO: Reading default fabricrc.txt
DBG [config.py]: Using config file /home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/../config/fabricrc.txt
INFO: Distribution __auto__
INFO: Get local environment
INFO: Ubuntu setup
DBG [distribution.py]: Debian-shared setup
DBG [distribution.py]: Source=trusty
DBG [distribution.py]: NixPkgs: Ignored
INFO: Now, testing connection to host...
INFO: Connection to host appears to work!
DBG [utils.py]: Expand paths
INFO: List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'annotations': ['GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', 'cosmic', 'ancestral', 'qsignature', 'transcripts', 'RADAR', 'mirbase'], 'validation': ['giab-NA12878', 'dream-syn3', 'dream-syn4'], 'name': 'Human (GRCh37)', 'dbkey': 'GRCh37'}], 'install_uniref': False}'): Human (GRCh37)
gzip: rmsk.txt.gz: invalid compressed data--format violated
Upgrading bcbio-nextgen data files
Setting up virtual machine
[localhost] local: cat /etc/*release | grep DISTRIB_CODENAME | cut -f 2 -d =
[localhost] local: echo $HOME
[localhost] local: uname -m
Running GGD recipe: srnaseq
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 207, in <module>
install.upgrade_bcbio(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 91, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 267, in upgrade_bcbio_data
cbl_deploy.deploy(s)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
_setup_vm(options, vm_launcher, actions)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
configure_instance(options, actions)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
setup_biodata(options)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
install_proc(options["genomes"], ["ggd", "s3", "raw"])
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
_prep_genomes(env, genomes, genome_indexes, ready_approaches)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 474, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 796, in _install_with_ggd
ggd.install_recipe(env.cwd, recipe_file)
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
recipe["recipe"]["full"]["recipe_type"])
File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
subprocess.check_output(["bash", run_file])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '/usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 1
' returned non-zero exit status 1
[ec2-user@ip-172-31-57-166 ~]$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 103079180 31276708 71702224 31% /
devtmpfs 15700668 100 15700568 1% /dev
tmpfs 15709676 0 15709676 0% /dev/shm
Tunc; Sorry about the continued problems. This looks like some kind of download error that you could fix be re-running the same install command on the machine after cleaning up the temporary directory:
rm -rf ~/install/bcbio-vm/data/genomes/Hsapiens/GRCh37/txtmp
bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --genomes GRCh37 --aligners bwa
It should pick up where it left off with the install and hopefully work cleanly on this second pass. You're nearly at the end stage of the install, so hope this gets everything working cleanly for you.
Dear Brad;
I followed what you have just said. As I read from the bcbio_vm github page. I ran the test analysis.
./run_test.sh docker_ipython
./run_test.sh docker
From both of these test, I got an
Ran 1 test in $time (in seconds) and OK
I believe my bcbio_vm is ready to run. I am aware that answer of this question might be a little intuitive but could you guide to run which bcbio. Like I stated above, I will have (70x coverage) ~250 bam files and my aim is to call mutations. My aim is to do it in most efficient way.
Bcbio nextgen has also a parallelization option on the otherside, bcbio_vm is built to manage this parallelization. Could you illuminate me in terms of both of the methods?
Most probably I will have a single instance unlike you mention about bcbio_vm is good at managing couple instances. However, If the efficiency difference is very much, I can learn how to separate the tasks to the instances hopefully with your help :)
Thank you for your help and patience.
Best,
Tunc.
Tunc; Great work, glad the install finished cleanly. Thanks for all your patience. For running I'd recommend a template configuration like this with a single caller:
details:
- analysis: variant2
genome_build: GRCh37
algorithm:
aligner: bwa
mark_duplicates: true
recalibrate: false
realign: false
variantcaller: vardict
that you can use to create a sample YAML file with bcbio_vm.py template
:
then running in parallel with:
bcbio_vm.py run your_samples.yaml -n <cores on machine>
Hope this gets you running.
Hi,
Thank you for the fast response at my first issue. It helped to understand my problem. I followed the instructions for installing docker image of bcbio_vm.py. However, during installation I faced with this error.
This is my docker info incase if it is needed.
I share you this info because first I thought I might be gone out of memory.