bcbio / bcbio-nextgen-vm

Run bcbio-nextgen genomic sequencing analyses using isolated containers and virtual machines
MIT License
65 stars 17 forks source link

bcbio_vm.py Docker instillation error #141

Closed mortunco closed 8 years ago

mortunco commented 8 years ago

Hi,

Thank you for the fast response at my first issue. It helped to understand my problem. I followed the instructions for installing docker image of bcbio_vm.py. However, during installation I faced with this error.

[ec2-user@ip-172-31-51-202 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools   --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
c13d1b0ae7ed: Download complete 
Pulling repository docker.io/bcbio/bcbio
23268c58abdc: Error pulling image (latest) from docker.io/bcbio/bcbio, Driver devicemapper failed to create image rootfs 03898921cd614d9d8ced45159174a79856fa99e2f6385d4fac1a1ab51988500f: Error running deviceCreate (createSnapDevice) dm_task_run failed reate (createSnapDevice) dm_task_run failed 
Error pulling image (latest) from docker.io/bcbio/bcbio, Driver devicemapper failed to create image rootfs 03898921cd614d9d8ced45159174a79856fa99e2f6385d4fac1a1ab51988500f: Error running deviceCreate (createSnapDevice) dm_task_run failed
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_vm.py", line 4, in <module>
    __import__('pkg_resources').run_script('bcbio-nextgen-vm==0.1.0a0', 'bcbio_vm.py')
  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.2.2-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.2.2-py2.7.egg/pkg_resources/__init__.py", line 1491, in run_script

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 303, in <module>

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 35, in cmd_install

  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 26, in full
  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 74, in pull
  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'pull', 'bcbio/bcbio']' returned non-zero exit status 1

This is my docker info incase if it is needed.

[ec2-user@ip-172-31-51-202 ~]$ docker info
Containers: 0
Images: 5
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-202:1-285055-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: 
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 1.936 GB
 Data Space Total: 107.4 GB
 Data Space Available: 2.126 GB
 Metadata Space Used: 1.704 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.126 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.17-22.30.amzn1.x86_64
Operating System: Amazon Linux AMI 2015.09
CPUs: 1
Total Memory: 995.6 MiB
Name: ip-172-31-51-202
ID: ORJJ:Y5S4:KSQD:ZPEX:GM7C:NMGY:J6WK:IWG3:AZF7:VW4W:HACJ:VAEY

I share you this info because first I thought I might be gone out of memory.

[ec2-user@ip-172-31-51-202 ~]$ df
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/xvda1       8123812   6047188   1976376  76% /
devtmpfs          500712        96    500616   1% /dev
tmpfs             509720         0    509720   0% /dev/shm
/dev/xvdf      515930552 296782596 192917172  61% /home/ec2-user/dendi
chapmanb commented 8 years ago

Sorry about the issues. As you suspect, I believe the problem is that you don't have enough space available on the root filesystem where Docker puts containers. The bcbio container is ~6Gb unpacked and you have ~2Gb, so my guess is the error is from that and then Docker rolls back the partially downloaded image to clean up space so if you look afterwards there appears to be some there.

If you start your Amazon instance with additional space on the root volume this should hopefully fix the issue. The bcbio AWS integration uses 20Gb for this:

https://github.com/chapmanb/bcbio-nextgen-vm/blob/master/elasticluster/config#L41

Hope this helps.

mortunco commented 8 years ago

Dear Brad,

I started a new instance which has 30 Gb storage. However this time I got an error about java getting out of memory. Is there a way to allocate a memory ? I made the suspicious text to bold so that you may tell me if I am wrong. Could you prefer me a instance hardware so that I can start building that one ?

Thank you for your help,

Best,

Tunc.

ubuntu@ip-172-31-50-19:~$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools   --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
8387d9ff0016: Already exists 
3b52deaaf0ed: Already exists 
4bd501fad6de: Already exists 
a3ed95caeb02: Already exists 
150b376cadde: Already exists 
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
Stopping docker container
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_vm.py", line 4, in <module>
    __import__('pkg_resources').run_script('bcbio-nextgen-vm==0.1.0a0', 'bcbio_vm.py')
  File "/home/ubuntu/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.2.2-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script

  File "/home/ubuntu/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.2.2-py2.7.egg/pkg_resources/__init__.py", line 1491, in run_script

  File "/home/ubuntu/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 307, in <module>

  File "/home/ubuntu/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 36, in cmd_install

  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 40, in full
  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/manage.py", line 47, in run_bcbio_cmd
subprocess.CalledProcessError: Command 'docker attach --no-stdin 77262f8ce6cb13361ed9ed71d662027d23e6155c16d0c67a4e93779768cbd614
  3650K .......... .......... .......... .......... .......... 73% 34.7M 0s
  3700K .......... .......... .......... .......... .......... 74% 46.5M 0s
  3750K .......... .......... .......... .......... .......... 75% 28.2M 0s
  3800K .......... .......... .......... .......... .......... 76% 49.1M 0s
  3850K .......... .......... .......... .......... .......... 77% 28.0M 0s
  3900K .......... .......... .......... .......... .......... 78% 28.5M 0s
  3950K .......... .......... .......... .......... .......... 79%  134M 0s
  4000K .......... .......... .......... .......... .......... 80% 27.6M 0s
  4050K .......... .......... .......... .......... .......... 81% 41.2M 0s
  4100K .......... .......... .......... .......... .......... 82% 29.0M 0s
  4150K .......... .......... .......... .......... .......... 83% 44.7M 0s
  4200K .......... .......... .......... .......... .......... 84% 30.7M 0s
  4250K .......... .......... .......... .......... .......... 85% 41.8M 0s
  4300K .......... .......... .......... .......... .......... 86% 26.0M 0s
  4350K .......... .......... .......... .......... .......... 87% 31.7M 0s
  4400K .......... .......... .......... .......... .......... 88% 38.1M 0s
  4450K .......... .......... .......... .......... .......... 89% 39.7M 0s
  4500K .......... .......... .......... .......... .......... 90% 30.4M 0s
  4550K .......... .......... .......... .......... .......... 91% 29.8M 0s
  4600K .......... .......... .......... .......... .......... 92% 47.3M 0s
  4650K .......... .......... .......... .......... .......... 93% 31.8M 0s
  4700K .......... .......... .......... .......... .......... 94% 40.5M 0s
  4750K .......... .......... .......... .......... .......... 95% 29.7M 0s
  4800K .......... .......... .......... .......... .......... 96% 30.9M 0s
  4850K .......... .......... .......... .......... .......... 97% 50.5M 0s
  4900K .......... .......... .......... .......... .......... 98% 51.3M 0s
  4950K .......... .......... .......... .......... .......... 99% 7.72M 0s
  5000K .......... .......... .........                       100% 49.3K=0.8s

2016-03-17 17:14:09 (6.25 MB/s) - written to stdout [5150442/5150442]

INFO: <cloudbio.flavor.Flavor instance at 0x7f5c819a0c68>
INFO: This is a ngs_pipeline_minimal flavor
INFO: Reading default fabricrc.txt
DBG [config.py]: Using config file /home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/../config/fabricrc.txt
INFO: Distribution __auto__
INFO: Get local environment
INFO: Ubuntu setup
DBG [distribution.py]: Debian-shared setup
DBG [distribution.py]: Source=trusty
DBG [distribution.py]: NixPkgs: Ignored
INFO: Now, testing connection to host...
INFO: Connection to host appears to work!
DBG [utils.py]: Expand paths
INFO: List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'annotations': ['GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', 'cosmic', 'ancestral', 'qsignature', 'transcripts', 'RADAR', 'mirbase'], 'validation': ['giab-NA12878', 'dream-syn3', 'dream-syn4'], 'name': 'Human (GRCh37)', 'dbkey': 'GRCh37'}], 'install_uniref': False}'): Human (GRCh37)
INFO: Genome preparation method ggd failed, trying next
INFO: Downloading genome from s3: GRCh37 rtg
--2016-03-17 17:14:10--  https://s3.amazonaws.com/biodata/genomes/GRCh37-rtg.tar.xz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.81.236
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.81.236|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-03-17 17:14:10 ERROR 404: Not Found.

Warning: local() encountered an error (return code 8) while executing 'wget --continue --no-check-certificate -O GRCh37-rtg.tar.xz 'https://s3.amazonaws.com/biodata/genomes/GRCh37-rtg.tar.xz''

**INFO: Genome preparation method s3 failed, trying next
INFO: Preparing genome GRCh37 with index rtg
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007a04a0000, 715849728, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 715849728 bytes for committing reserved memory.
# An error report file with more information is saved as:
# ./hs_err_pid114.log
The operating system did not make requested memory available to the JVM.  Try removing other jobs on this machine, adjusting allocated memory appropriate to currently available memory, or adjusting command parameters to reduce memory requirements. More information is contained in the file: **./hs_err_pid114.log
Upgrading bcbio-nextgen data files
Setting up virtual machine
[localhost] local: cat /etc/*release | grep DISTRIB_CODENAME | cut -f 2 -d =
[localhost] local: echo $HOME
[localhost] local: uname -m
[localhost] local: pwd
[localhost] local: echo $HOME
[localhost] local: mkdir -p '/home/ubuntu/tmp/cloudbiolinux/84b60da4-dfde-36aa-801b-7011d94bae4c'
[localhost] local: wget --continue --no-check-certificate -O GRCh37-rtg.tar.xz 'https://s3.amazonaws.com/biodata/genomes/GRCh37-rtg.tar.xz'
[localhost] local: export RTG_JAVA_OPTS='-Xms1g' export RTG_MEM=2g && rtg format -o rtg/GRCh37.sdf /usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa

Fatal error: local() encountered an error (return code 1) while executing 'export RTG_JAVA_OPTS='-Xms1g' export RTG_MEM=2g && rtg format -o rtg/GRCh37.sdf /usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa'

Aborting.
INFO: Genome preparation method raw failed, trying next
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 207, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 91, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 267, in upgrade_bcbio_data
    cbl_deploy.deploy(s)
  File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
    _setup_vm(options, vm_launcher, actions)
  File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
    configure_instance(options, actions)
  File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
    setup_biodata(options)
  File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
    install_proc(options["genomes"], ["ggd", "s3", "raw"])
  File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
    _prep_genomes(env, genomes, genome_indexes, ready_approaches)
  File "/home/ubuntu/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 486, in _prep_genomes
    raise IOError("Could not prepare index {0} for {1} by any method".format(idx, gid))
IOError: Could not prepare index rtg for GRCh37 by any method
' returned non-zero exit status 1
ubuntu@ip-172-31-50-19:~$ df
Filesystem     1K-blocks      Used Available Use% Mounted on
udev              503212        12    503200   1% /dev
tmpfs             101636       340    101296   1% /run
/dev/xvda1      30822592  17702576  11762416  61% /
none                   4         0         4   0% /sys/fs/cgroup
none                5120         0      5120   0% /run/lock
none              508160         0    508160   0% /run/shm
none              102400         0    102400   0% /run/user
/dev/xvdf      515930552 296782596 192917172  61% /home/ubuntu/dendi
chapmanb commented 8 years ago

Sorry about the memory issues -- what type of instance type are you running with? There is some index building and other work in the preparation, so you probably want to go with t2.large or better to have a reasonable amount of compute and memory. The very small images don't have much memory so might be causing the issue here. Hope this helps.

mortunco commented 8 years ago

I am running an instance of t2.micro :) I am aware that It is small btw. I am using it because it is free tho.

Edit: I initiated a m4.large instance. I will report you if its work.

mortunco commented 8 years ago

Dear @chapmanb ,

Bad news, apparently I faced with another memory problem. I also posted the df command to show you my root memory. It is 30 GB. however, even that did not merit. Also, since I am only going to use bcbio for calling mutations of tumor/normal samples I think I dont need to download other RNAseq stuff. Also I dont think I need aligners too. As I understood from the tutorial I can to --datatarget variation command to make my bcbio_vm.py installation more specific. However, when I removed --aligners from the command line it, it gave an error such as below;

[ec2-user@ip-172-31-62-129 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools   --genomes GRCh37
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
Data not installed, no aligners provided with `--aligners` flag

This is my systems df.

Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/xvda1      30830568  30698748     31572 100% /
devtmpfs         4080360       100   4080260   1% /dev
tmpfs            4089368         0   4089368   0% /dev/shm
/dev/xvdf      515930552 296782596 192917172  61% /home/ec2-user/dendi

This is the error that I got while following regular installation.

[ec2-user@ip-172-31-62-129 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools \
>   --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
Stopping docker container
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_vm.py", line 4, in <module>
    __import__('pkg_resources').run_script('bcbio-nextgen-vm==0.1.0a0', 'bcbio_vm.py')
  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1491, in run_script

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 307, in <module>

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 36, in cmd_install

  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 40, in full
  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/manage.py", line 47, in run_bcbio_cmd
subprocess.CalledProcessError: Command 'docker attach --no-stdin f98cba899c2ff1293cc9c7ca1915c778c39ed8fc944a807a55cd8d630862a810
  3200K .......... .......... .......... .......... .......... 64% 43.7M 0s
  3250K .......... .......... .......... .......... .......... 65% 39.6M 0s
  3300K .......... .......... .......... .......... .......... 66% 77.8M 0s
  3350K .......... .......... .......... .......... .......... 67% 77.6M 0s
  3400K .......... .......... .......... .......... .......... 68% 47.2M 0s
  3450K .......... .......... .......... .......... .......... 69% 32.5M 0s
  3500K .......... .......... .......... .......... .......... 70% 19.2M 0s
  3550K .......... .......... .......... .......... .......... 71%  127M 0s
  3600K .......... .......... .......... .......... .......... 72% 91.6M 0s
  3650K .......... .......... .......... .......... .......... 73% 82.3M 0s
  3700K .......... .......... .......... .......... .......... 74% 74.6M 0s
  3750K .......... .......... .......... .......... .......... 75% 52.8M 0s
  3800K .......... .......... .......... .......... .......... 76% 62.4M 0s
  3850K .......... .......... .......... .......... .......... 77% 45.8M 0s
  3900K .......... .......... .......... .......... .......... 78% 22.6M 0s
  3950K .......... .......... .......... .......... .......... 79% 32.3M 0s
  4000K .......... .......... .......... .......... .......... 80% 29.9M 0s
  4050K .......... .......... .......... .......... .......... 81% 30.2M 0s
  4100K .......... .......... .......... .......... .......... 82% 81.9M 0s
  4150K .......... .......... .......... .......... .......... 83% 45.7M 0s
  4200K .......... .......... .......... .......... .......... 84% 73.0M 0s
  4250K .......... .......... .......... .......... .......... 85% 47.5M 0s
  4300K .......... .......... .......... .......... .......... 86% 68.3M 0s
  4350K .......... .......... .......... .......... .......... 87% 52.0M 0s
  4400K .......... .......... .......... .......... .......... 88% 65.3M 0s
  4450K .......... .......... .......... .......... .......... 89% 43.1M 0s
  4500K .......... .......... .......... .......... .......... 90% 61.4M 0s
  4550K .......... .......... .......... .......... .......... 91% 54.5M 0s
  4600K .......... .......... .......... .......... .......... 92% 60.9M 0s
  4650K .......... .......... .......... .......... .......... 93% 47.6M 0s
  4700K .......... .......... .......... .......... .......... 94% 57.0M 0s
  4750K .......... .......... .......... .......... .......... 95% 72.0M 0s
  4800K .......... .......... .......... .......... .......... 96% 36.8M 0s
  4850K .......... .......... .......... .......... .......... 97% 23.9M 0s
  4900K .......... .......... .......... .......... .......... 98% 14.6M 0s
  4950K .......... .......... .......... .......... .......... 99% 18.5M 0s
  5000K .......... .......... .........                       100% 9.74M=0.2s

2016-03-18 09:30:17 (28.9 MB/s) - written to stdout [5150448/5150448]

INFO: <cloudbio.flavor.Flavor instance at 0x7f19a7af0998>
INFO: This is a ngs_pipeline_minimal flavor
INFO: Reading default fabricrc.txt
DBG [config.py]: Using config file /home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/../config/fabricrc.txt
INFO: Distribution __auto__
INFO: Get local environment
INFO: Ubuntu setup
DBG [distribution.py]: Debian-shared setup
DBG [distribution.py]: Source=trusty
DBG [distribution.py]: NixPkgs: Ignored
INFO: Now, testing connection to host...
INFO: Connection to host appears to work!
DBG [utils.py]: Expand paths
INFO: List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'annotations': ['GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', 'cosmic', 'ancestral', 'qsignature', 'transcripts', 'RADAR', 'mirbase'], 'validation': ['giab-NA12878', 'dream-syn3', 'dream-syn4'], 'name': 'Human (GRCh37)', 'dbkey': 'GRCh37'}], 'install_uniref': False}'): Human (GRCh37)
--2016-03-18 09:30:17--  https://s3.amazonaws.com/biodata/annotation/GRCh37-rnaseq-2015-12-01.tar.xz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.17.232
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.17.232|:443... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable

    The file is already fully retrieved; nothing to do.

tar: GRCh37/rnaseq-2015-12-01/tophat/GRCh37_transcriptome.4.bt2: Wrote only 8192 of 10240 bytes
tar: GRCh37/rnaseq-2015-12-01/kallisto/GRCh37: Wrote only 4096 of 10240 bytes
tar: GRCh37/rnaseq-2015-12-01/ref-transcripts-splicesites.txt: Cannot write: No space left on device
tar: GRCh37/rnaseq-2015-12-01/ref-transcripts-mask.gtf: Cannot write: No space left on device
tar: Exiting with failure status due to previous errors
Upgrading bcbio-nextgen data files
Setting up virtual machine
[localhost] local: cat /etc/*release | grep DISTRIB_CODENAME | cut -f 2 -d =
[localhost] local: echo $HOME
[localhost] local: uname -m
Running GGD recipe: transcripts
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 207, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 91, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 267, in upgrade_bcbio_data
    cbl_deploy.deploy(s)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
    _setup_vm(options, vm_launcher, actions)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
    configure_instance(options, actions)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
    setup_biodata(options)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
    install_proc(options["genomes"], ["ggd", "s3", "raw"])
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
    _prep_genomes(env, genomes, genome_indexes, ready_approaches)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 474, in _prep_genomes
    retrieve_fn(env, manager, gid, idx)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 796, in _install_with_ggd
    ggd.install_recipe(env.cwd, recipe_file)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
    recipe["recipe"]["full"]["recipe_type"])
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
    subprocess.check_output(["bash", run_file])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '/usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 2
' returned non-zero exit status 1

To sum up, I will try with a 50 Gb space service and see if that one works. I am in a doubt of if there is a problem which may cause a recursive installation of a specific file that fill up the memory? Can this happen ?

I really need to get this program working.

Thank you for your help,

Best,

Tunc.

mortunco commented 8 years ago

Dear @chapman, I tried installation with 50 Gb of root space and this is the current output. This time I did not get an error about space. The error was about a file 1000G_phase1.snps.high_confidence.vcf.gz you may see the the error below.

Thank you for your help,

Tunc.

[ec2-user@ip-172-31-59-88 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools \
>   --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio

Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest

Stopping docker container
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_vm.py", line 4, in <module>
    __import__('pkg_resources').run_script('bcbio-nextgen-vm==0.1.0a0', 'bcbio_vm.py')
  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1491, in run_script

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 307, in <module>

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 36, in cmd_install

  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 40, in full
  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/manage.py", line 47, in run_bcbio_cmd
subprocess.CalledProcessError: Command 'docker attach --no-stdin 6fccb7b5b19747613ad7cbbad234c95b567139cf8a3aa0500f77f747e1c6f83a
1753150K .......... .......... .......... .......... .......... 99%  306K 2s
1753200K .......... .......... .......... .......... .......... 99% 3.61M 2s
1753250K .......... .......... .......... .......... .......... 99% 2.77M 2s
1753300K .......... .......... .......... .......... .......... 99% 2.75M 2s
1753350K .......... .......... .......... .......... .......... 99% 4.73M 2s
1753400K .......... .......... .......... .......... .......... 99% 1.04M 2s
1753450K .......... .......... .......... .......... .......... 99% 1.69M 2s
1753500K .......... .......... .......... .......... .......... 99% 1.70M 2s
1753550K .......... .......... .......... .......... .......... 99% 1.72M 1s
1753600K .......... .......... .......... .......... .......... 99% 1.71M 1s
1753650K .......... .......... .......... .......... .......... 99% 1.66M 1s
1753700K .......... .......... .......... .......... .......... 99% 1.69M 1s
1753750K .......... .......... .......... .......... .......... 99% 1.69M 1s
1753800K .......... .......... .......... .......... .......... 99% 1.62M 1s
1753850K .......... .......... .......... .......... .......... 99% 1.73M 1s
1753900K .......... .......... .......... .......... .......... 99% 1.68M 1s
1753950K .......... .......... .......... .......... .......... 99% 1.71M 1s
1754000K .......... .......... .......... .......... .......... 99% 1.68M 1s
1754050K .......... .......... .......... .......... .......... 99% 1.73M 1s
1754100K .......... .......... .......... .......... .......... 99% 1.69M 1s
1754150K .......... .......... .......... .......... .......... 99% 1.72M 1s
1754200K .......... .......... .......... .......... .......... 99% 1.68M 1s
1754250K .......... .......... .......... .......... .......... 99% 1.69M 1s
1754300K .......... .......... .......... .......... .......... 99% 1.67M 1s
1754350K .......... .......... .......... .......... .......... 99% 1.72M 1s
1754400K .......... .......... .......... .......... .......... 99% 1.69M 1s
1754450K .......... .......... .......... .......... .......... 99% 1.70M 1s
1754500K .......... .......... .......... .......... .......... 99% 1.70M 1s
1754550K .......... .......... .......... .......... .......... 99% 1.68M 1s
1754600K .......... .......... .......... .......... .......... 99% 1.71M 1s
1754650K .......... .......... .......... .......... .......... 99% 1.72M 1s
1754700K .......... .......... .......... .......... .......... 99% 1.66M 1s
1754750K .......... .......... .......... .......... .......... 99% 1.75M 1s
1754800K .......... .......... .......... .......... .......... 99% 1.68M 1s
1754850K .......... .......... .......... .......... .......... 99% 1.69M 1s
1754900K .......... .......... .......... .......... .......... 99% 1.75M 1s
1754950K .......... .......... .......... .......... .......... 99% 1.70M 1s
1755000K .......... .......... .......... .......... .......... 99% 1.65M 1s
1755050K .......... .......... .......... .......... .......... 99% 1.75M 1s
1755100K .......... .......... .......... .......... .......... 99% 1.71M 1s
1755150K .......... .......... .......... .......... .......... 99% 1.67M 1s
1755200K .......... .......... .......... .......... .......... 99% 20.0M 1s
1755250K .......... .......... .......... .......... .......... 99% 1.71M 0s
1755300K .......... .......... .......... .......... .......... 99% 1.76M 0s
1755350K .......... .......... .......... .......... .......... 99% 1.64M 0s
1755400K .......... .......... .......... .......... .......... 99% 1.71M 0s
1755450K .......... .......... .......... .......... .......... 99% 1.77M 0s
1755500K .......... .......... .......... .......... .......... 99% 1.64M 0s
1755550K .......... .......... .......... .......... .......... 99% 1.76M 0s
1755600K .......... .......... .......... .......... .......... 99% 1.71M 0s
1755650K .......... .......... .......... .......... .......... 99% 1.64M 0s
1755700K .......... .......... .......... .......... .......... 99% 1.71M 0s
1755750K .......... .......... .......... .......... .......... 99% 1.70M 0s
1755800K .......... .......... .......... .......... .......... 99% 1.67M 0s
1755850K .......... .......... .......... .......... .......... 99% 1.74M 0s
1755900K .......... .......... .......... .......... .......... 99% 37.5M 0s
1755950K .......... .......... .......... .......... .......... 99% 1.75M 0s
1756000K .......... .......... .......... .......... .......... 99% 1.67M 0s
1756050K .......... .......... .......... .......... .......... 99% 1.73M 0s
1756100K .......... .......... .......... .......... .......... 99% 1.71M 0s
1756150K .....                                                 100% 10414G=16m52s

2016-03-18 12:14:29 (1.69 MB/s) - written to stdout [1798303191]

Not a BGZF file: 1000G_phase1.snps.high_confidence.vcf.gz
tbx_index_build failed: 1000G_phase1.snps.high_confidence.vcf.gz
Upgrading bcbio-nextgen data files
Setting up virtual machine
[localhost] local: cat /etc/*release | grep DISTRIB_CODENAME | cut -f 2 -d =
[localhost] local: echo $HOME
[localhost] local: uname -m
Running GGD recipe: 1000g_snps
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 207, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 91, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 267, in upgrade_bcbio_data
    cbl_deploy.deploy(s)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
    _setup_vm(options, vm_launcher, actions)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
    configure_instance(options, actions)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
    setup_biodata(options)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
    install_proc(options["genomes"], ["ggd", "s3", "raw"])
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
    _prep_genomes(env, genomes, genome_indexes, ready_approaches)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 474, in _prep_genomes
    retrieve_fn(env, manager, gid, idx)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 796, in _install_with_ggd
    ggd.install_recipe(env.cwd, recipe_file)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
    recipe["recipe"]["full"]["recipe_type"])
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
    subprocess.check_output(["bash", run_file])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '/usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 1
' returned non-zero exit status 1
chapmanb commented 8 years ago

Tunc; Sorry about the continued problems. With your latest error if you check df -h how does disk space usage appear to be? 50Gb total is going to be tight and I'm worried you're getting a truncated file or something else here. The script that prepares this is downloading and piping directly into a bgzipped file so shouldn't get this unless something is going wrong:

https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/GRCh37/1000g_snps.yaml#L13

What type of analysis are you trying to run? I'm worried with small image sizes and trying hard to minimize disk space you might not have enough resources to run your actual analysis. EBS storage is cheap compared to instance costs, so I'd suggest providing a bunch of disk to avoid any issues.

Hope this explains the issue.

mortunco commented 8 years ago

Brad;

Tell me any size that will run this command and I will make it come through :) . I will have ~ 230 bam files from tumor and normal patients and I am going to call mutations of them. I won't align them. I only need those variant callers to call my mutations ( Indel and SNV/SNP) prefentially. Thats why I won't need aligners, rnaseq, snv callers, cnv caller etc.

My current instance is m4.large.

[ec2-user@ip-172-31-59-88 cancer-dream-syn3]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       50G   37G   13G  75% /
devtmpfs        3.9G  120K  3.9G   1% /dev
tmpfs           3.9G   48K  3.9G   1% /dev/shm
/dev/xvdf       493G  284G  184G  61% /home/ec2-user/dendi

Right now, installation does not respond me anything. I believe servers are angry to me because I am insisting too much for the data. Anyways, I am showing you the latest output from the command. I stopped it because It has been like this for ~15 minutes. I also checked the df size during the process, it did not change at all.

[ec2-user@ip-172-31-59-88 cancer-dream-syn3]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools   --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
^Z
[2]+  Stopped                 bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools --genomes GRCh37 --aligners bwa

I am ready to do anything just to call these mutations regardless from the size and the amount of Gb storage space. As soon I install them, I might need your help more about running bcbio. I am aware that I am writing too much here, but believe me my PI is about to kill me.

Thank you very much,

Tunc.

chapmanb commented 8 years ago

Tunc; Sorry about the time pressure. My recommendation would be to step back and think about your resource requirements for running this job. 230 tumor/normal pairs (exome or whole genome?) is going to require multiple machines on AWS and be a significant run time. Depending on how the BAMs were initially aligned and prepared, you might also need to realign to ensure read groups and genome builds are correct.

So, an m4.large instance is not going to get you far in terms of running these. Do you have a local cluster you could run these on? bcbio works cleanly on local infastructure with most schedulers. If you need to run on Amazon, then you'll need a cluster setup there as well and it might be worth looking at using bcbio_vm to bootstrap a cluster with a shared filesystem:

https://bcbio-nextgen.readthedocs.org/en/latest/contents/cloud.html

Sorry to not have a quick answer for you but I do think working through your resource requirements and needs will be a faster solution in the long term.

mortunco commented 8 years ago

Dear Brad;

I have whole genome bams. But I am thinking about calling each patient at a time (which are 1 tumour and normal). I would like to start in the most basic sense which is just be able to run one of the examples. After that, I will increase the size of the work one at a time. Therefore, for initialisation I am considering Amazon as one strong computer which can do mutation calling.

So, let me rephrase my question, What should I do for installing bcbio_vm.py correct and at least ready to call a single pair of mutations.(or reproduce a tutorial on documentation). In this context, is m4.large is enough ? or should I increase it ? Right now, my installation literally got frozen.

Thank you for your time,

Best,

Tunc.

chapmanb commented 8 years ago

Tunc; Sorry about the freezing issues. I'm not sure what would cause that and wouldn't expect bcbio to cause it on an m4.large during installation. For the installation itself this should be sufficient.

Regarding running whole genome BAMs (what coverage do you have?), I haven't tried to estimate the smallest machine to run these so don't have a great guess on runtimes. You'd need at least 16 cores, so would need an m4.4xlarge or better. You'll probably expect a day+ runtime for a single tumor/normal run with these, although it depends on the variant caller you're using. Hope this helps.

mortunco commented 8 years ago

My coverage is about 50-70x that means my bams are almost are 150-200 gb each. Right now my biggest problem is installation. Could we focus on that? To identify we are on the same page, I am following the guide here . Because in documentation I couldnt find installing the actual tools other than installing bcbio_vm.py (as a remote for initiating the process). Also to be sure about ram and storage, I initiated a r3.large instance which has 100 GB root space and Amazon Linux AMI 2015.09.2 (HVM), SSD Volume Type on it. Still I have a freezing installation for somehow.

[ec2-user@ip-172-31-57-166 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools \
>   --genomes GRCh37 --aligners bwa
/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
bf61d14f65db: Pull complete 
3ea15286bc1a: Pull complete 
515285067dbf: Pull complete 
5e89b839ecfa: Pull complete 
8476146c257f: Pull complete 
c13d1b0ae7ed: Pull complete 
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Downloaded newer image for bcbio/bcbio:latest
chapmanb commented 8 years ago

Tunc; If your plan is to run on a single machine, I recommend using the standard installation and bcbio_nextgen.py:

https://bcbio-nextgen.readthedocs.org/en/latest/contents/installation.html#automated

This is more well tested and will hopefully work cleaner for you. It does not need Docker and will download and install all the tools locally. Sorry if the bcbio_vm/AWS instructions are confusing and leading you down the wrong path -- that's meant to provide a better experience if you follow those instructions and use it to boot up a cluster.

Sorry I'm not able to help more with your current issue but if you want to keep going down that path, I'm happy to try and help more if you can identify any error messages that might provide a clue.

mortunco commented 8 years ago

Dear @chapmanb,

This is my most recent error. I will try that path. But I ask you to please not close this issue in case of somebody will find a solution to it. Meanwhile i will try to use the build inside of the link, could you please try to identify my problem.

Thank you for your help and especially for the patience. Even though we have not solved this problem yet, at least you have not given up on me for helping. If I get work this program, I might need your address to send a champagne !

Thanks,

Tunc!

[ec2-user@ip-172-31-57-166 ~]$ bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --tools \
>   --genomes GRCh37 --aligners bwa
Retrieving bcbio-nextgen docker image with code and tools
Using default tag: latest
latest: Pulling from bcbio/bcbio
Digest: sha256:eb6690f984b44a47389a161240b930b848e62f5e946202a021122e15f8da189b
Status: Image is up to date for bcbio/bcbio:latest
Stopping docker container
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_vm.py", line 4, in <module>
    __import__('pkg_resources').run_script('bcbio-nextgen-vm==0.1.0a0', 'bcbio_vm.py')
  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1491, in run_script

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 307, in <module>

  File "/home/ec2-user/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio_nextgen_vm-0.1.0a0-py2.7.egg/EGG-INFO/scripts/bcbio_vm.py", line 36, in cmd_install

  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/install.py", line 40, in full
  File "build/bdist.linux-x86_64/egg/bcbiovm/docker/manage.py", line 47, in run_bcbio_cmd
subprocess.CalledProcessError: Command 'docker attach --no-stdin 9e63bbe9c5ebb0754ba0d8b462f90c1b4199907e2c5bd779edf6e42a805bca6e
  2700K .......... .......... .......... .......... ..........  114M
  2750K .......... .......... .......... .......... .......... 59.0M
  2800K .......... .......... .......... .......... .......... 34.3M
  2850K .......... .......... .......... .......... .......... 60.0M
  2900K .......... .......... .......... .......... .......... 31.5M
  2950K .......... .......... .......... .......... ..........  319M
  3000K .......... .......... .......... .......... .......... 55.6M
  3050K .......... .......... .......... .......... .......... 33.6M
  3100K .......... .......... .......... .......... .......... 52.1M
  3150K .......... .......... .......... .......... .......... 47.4M
  3200K .......... .......... .......... .......... .......... 53.1M
  3250K .......... .......... .......... .......... .......... 78.2M
  3300K .......... .......... .......... .......... .......... 40.9M
  3350K .......... .......... .......... .......... .......... 45.0M
  3400K .......... .......... .......... .......... .......... 49.1M
  3450K .......... .......... .......... .......... .......... 68.0M
  3500K .......... .......... .......... .......... .......... 64.9M
  3550K .......... .......... .......... .......... .......... 30.7M
  3600K .......... .......... .......... .......... .......... 64.1M
  3650K .......... .......... .......... .......... .......... 38.0M
  3700K .......... .......... .......... .......... .......... 40.3M
  3750K .......... .......... .......... .......... .......... 37.0M
  3800K .......... .......... .......... .......... .......... 71.8M
  3850K .......... .......... .......... .......... .......... 59.0M
  3900K .......... .......... .......... .......... .......... 49.5M
  3950K .......... .......... .......... .......... .......... 46.7M
  4000K .......... .......... .......... .......... .......... 51.4M
  4050K .......... .......... .......... .......... .......... 11.3M
  4100K .......... .......... .......... .......... .......... 44.5M
  4150K .......... .......... .......... .......... .......... 30.6M
  4200K .......... .......... .......... .......... .......... 41.9M
  4250K .......... .......... .......... .......... .......... 84.8M
  4300K .......... .......... .......... .......... ..........  101M
  4350K .......... .......... .......... .......... .......... 34.0M
  4400K .......... .......... .......... .......... .......... 95.3M
  4450K .......... .......... .......... .......... ..........  173M
  4500K .......... .......... .......... .......... ..........  198M
  4550K .......... .......... .......... .......... .......... 29.1M
  4600K .......... .......... .......... .......... .......... 83.1M
  4650K .......... .......... .......... .......... .......... 35.9M
  4700K .......... .......... .......... .......... .......... 80.1M
  4750K .......... .......... .......... .......... .......... 39.9M
  4800K .......... .......... .......... .......... .......... 86.0M
  4850K .......... .......... .......... .......... .......... 56.2M
  4900K .......... .......... .......... .......... .......... 36.6M
  4950K .......... .......... .......... .......... .......... 26.5M
  5000K .......... .......... .........                        12.2M=0.2s

2016-03-18 17:49:56 (30.1 MB/s) - written to stdout [5150448]

INFO: <cloudbio.flavor.Flavor instance at 0x7fec04027fc8>
INFO: This is a ngs_pipeline_minimal flavor
INFO: Reading default fabricrc.txt
DBG [config.py]: Using config file /home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/../config/fabricrc.txt
INFO: Distribution __auto__
INFO: Get local environment
INFO: Ubuntu setup
DBG [distribution.py]: Debian-shared setup
DBG [distribution.py]: Source=trusty
DBG [distribution.py]: NixPkgs: Ignored
INFO: Now, testing connection to host...
INFO: Connection to host appears to work!
DBG [utils.py]: Expand paths
INFO: List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'rtg'], 'genomes': [{'annotations': ['GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', 'cosmic', 'ancestral', 'qsignature', 'transcripts', 'RADAR', 'mirbase'], 'validation': ['giab-NA12878', 'dream-syn3', 'dream-syn4'], 'name': 'Human (GRCh37)', 'dbkey': 'GRCh37'}], 'install_uniref': False}'): Human (GRCh37)

gzip: rmsk.txt.gz: invalid compressed data--format violated
Upgrading bcbio-nextgen data files
Setting up virtual machine
[localhost] local: cat /etc/*release | grep DISTRIB_CODENAME | cut -f 2 -d =
[localhost] local: echo $HOME
[localhost] local: uname -m
Running GGD recipe: srnaseq
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 207, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 91, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 267, in upgrade_bcbio_data
    cbl_deploy.deploy(s)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
    _setup_vm(options, vm_launcher, actions)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
    configure_instance(options, actions)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
    setup_biodata(options)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
    install_proc(options["genomes"], ["ggd", "s3", "raw"])
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
    _prep_genomes(env, genomes, genome_indexes, ready_approaches)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 474, in _prep_genomes
    retrieve_fn(env, manager, gid, idx)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 796, in _install_with_ggd
    ggd.install_recipe(env.cwd, recipe_file)
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
    recipe["recipe"]["full"]["recipe_type"])
  File "/home/ec2-user/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
    subprocess.check_output(["bash", run_file])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '/usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 1
' returned non-zero exit status 1

[ec2-user@ip-172-31-57-166 ~]$ df
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/xvda1     103079180 31276708  71702224  31% /
devtmpfs        15700668      100  15700568   1% /dev
tmpfs           15709676        0  15709676   0% /dev/shm
chapmanb commented 8 years ago

Tunc; Sorry about the continued problems. This looks like some kind of download error that you could fix be re-running the same install command on the machine after cleaning up the temporary directory:

rm -rf ~/install/bcbio-vm/data/genomes/Hsapiens/GRCh37/txtmp
bcbio_vm.py --datadir=~/install/bcbio-vm/data install --data --genomes GRCh37 --aligners bwa

It should pick up where it left off with the install and hopefully work cleanly on this second pass. You're nearly at the end stage of the install, so hope this gets everything working cleanly for you.

mortunco commented 8 years ago

Dear Brad;

I followed what you have just said. As I read from the bcbio_vm github page. I ran the test analysis.

./run_test.sh docker_ipython
./run_test.sh docker

From both of these test, I got an

Ran 1 test in $time (in seconds) and OK  

I believe my bcbio_vm is ready to run. I am aware that answer of this question might be a little intuitive but could you guide to run which bcbio. Like I stated above, I will have (70x coverage) ~250 bam files and my aim is to call mutations. My aim is to do it in most efficient way.

Bcbio nextgen has also a parallelization option on the otherside, bcbio_vm is built to manage this parallelization. Could you illuminate me in terms of both of the methods?

Most probably I will have a single instance unlike you mention about bcbio_vm is good at managing couple instances. However, If the efficiency difference is very much, I can learn how to separate the tasks to the instances hopefully with your help :)

Thank you for your help and patience.

Best,

Tunc.

chapmanb commented 8 years ago

Tunc; Great work, glad the install finished cleanly. Thanks for all your patience. For running I'd recommend a template configuration like this with a single caller:

details:
  - analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: true
      recalibrate: false
      realign: false
      variantcaller: vardict

that you can use to create a sample YAML file with bcbio_vm.py template:

https://bcbio-nextgen.readthedocs.org/en/latest/contents/configuration.html#automated-sample-configuration

then running in parallel with:

bcbio_vm.py run your_samples.yaml -n <cores on machine>

Hope this gets you running.