bioboxes / rfc

Request for comments on interchangeable bioinformatics containers
http://bioboxes.org
MIT License
40 stars 9 forks source link

bioboxes/megahit should fail with error message if the number of CPUs is less than 2. #190

Closed tomsitter closed 7 years ago

tomsitter commented 8 years ago

I'm following the tutorial and successfully ran the biobox/velvet example with the test dataset. When I try to follow the last step (switch to biobox/megahit) I encountered this error.

> biobox run short_read_assembler bioboxes/megahit --input reads.fq.gz --output contigs.fa

Traceback (most recent call last):
  File "/usr/local/bin/biobox", line 9, in <module>
    biobox.run()
  File "/usr/local/lib/python2.7/dist-packages/biobox_cli/main.py", line 30, in run
    util.select_module("command", opts["<command>"]).run(args)
  File "/usr/local/lib/python2.7/dist-packages/biobox_cli/command/run.py", line 24, in run
    ctnr = bbx.run(argv)
  File "/usr/local/lib/python2.7/dist-packages/biobox_cli/biobox.py", line 29, in run
    self.after_run(output, host_dst_dir)
  File "/usr/local/lib/python2.7/dist-packages/biobox_cli/biobox_type/short_read_assembler.py", line 48, in after_ru
    biobox_output = fle.parse(host_dst_dir)
  File "/usr/local/lib/python2.7/dist-packages/biobox_cli/biobox_file.py", line 9, in parse
    with open(os.path.join(dir_, 'biobox.yaml'), 'r') as f:
IOError: [Errno 2] No such file or directory: '/tmp/tmpLowG4F/biobox.yaml'

Below are my installation and test commands (that were all successful)

On fresh Debian GNU/Linux 8 (jessie) OS, Kernel version 3.16.0-4-amd64 (from DigitalOcean)

Install Docker

apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
apt-get install apt-transport-https
apt-get update 
apt-get install docker-engine

Test Docker

service docker start
docker run hello-world   #Works!

Install Biobox

python -V     #python 2.7.9
apt-get install python-pip
pip install biobox_cli    #biobox-cli-0.3.0.tar.gz

Test Biobox

wget   --output-document reads.fq.gz   'https://www.dropbox.com/s/uxgn6cqngctqv74/reads.fq.gz?dl=1'
docker pull bioboxes/velvet #Assembler
biobox run \
    short_read_assembler \
    bioboxes/velvet \
    --input reads.fq.gz \
    --output contigs.fa
#contigs.fa produced successfully
tomsitter commented 8 years ago

I found this command in the biobox cli

>biobox verify short_read_assembler bioboxes/megahit
Error "bioboxes/megahit" is not a valid short_read_assembler biobox.
Should create a contigs file when given a valid biobox.yml and FASTQ data.
pbelmann commented 8 years ago

Hi Tom,

could you try to run the following steps:

mkdir input output
wget   -O input/reads.fq.gz  \
  --quiet \
  'https://www.dropbox.com/s/j1z91gr9ovboekm/genome_reads.fq.gz?dl=1'
cat << EOF > input/biobox.yaml      
version: 0.9.0                      
arguments:                          
- fastq:                            
  - id: fastq                       
    type: paired                    
    value: /fastq/input.fq.gz       
EOF
docker run -v "${PWD}/input/biobox.yaml:/bbx/input/biobox.yaml:ro" -v "${PWD}/input/reads.fq.gz:/fastq/input.fq.gz:ro" -v "${PWD}/output:/bbx/mnt/output:rw"  bioboxes/megahit default

I just want to make sure that it is not a bug in the biobox.

tomsitter commented 8 years ago

Hi, the box looks like it ran successfully using this code but I have no files in output/.

mkdir input output
wget   -O input/reads.fq.gz  \
  --quiet \
  'https://www.dropbox.com/s/j1z91gr9ovboekm/genome_reads.fq.gz?dl=1'
cat << EOF > input/biobox.yaml      
version: 0.9.0                      
arguments:                          
- fastq:                            
  - id: fastq                       
    type: paired                    
    value: /fastq/input.fq.gz       
EOF
docker run -v "${PWD}/input/biobox.yaml:/bbx/input/biobox.yaml:ro" -v "${PWD}/input/reads.fq.gz:/fastq/input.fq.gz:ro" -v "${PWD}/output:/bbx/mnt/output:rw"  bioboxes/megahit default
MEGAHIT v0.2.1
[Thu Jan 21 17:52:10 2016] Start assembly. Number of CPU threads 2.
[Thu Jan 21 17:52:10 2016] Extracting solid (k+1)-mers for k = 21
[Thu Jan 21 17:52:14 2016] Building graph for k = 21
[Thu Jan 21 17:52:21 2016] Assembling contigs from SdBG for k = 21
[Thu Jan 21 17:52:29 2016] Extracting iterative edges from k = 21 to 31
[Thu Jan 21 17:52:31 2016] Building graph for k = 31
[Thu Jan 21 17:52:33 2016] Assembling contigs from SdBG for k = 31
[Thu Jan 21 17:52:35 2016] Extracting iterative edges from k = 31 to 41
[Thu Jan 21 17:52:35 2016] Building graph for k = 41
[Thu Jan 21 17:52:37 2016] Assembling contigs from SdBG for k = 41
[Thu Jan 21 17:52:38 2016] Extracting iterative edges from k = 41 to 51
[Thu Jan 21 17:52:38 2016] Building graph for k = 51
[Thu Jan 21 17:52:39 2016] Assembling contigs from SdBG for k = 51
[Thu Jan 21 17:52:40 2016] Extracting iterative edges from k = 51 to 61
[Thu Jan 21 17:52:40 2016] Building graph for k = 61
[Thu Jan 21 17:52:40 2016] Assembling contigs from SdBG for k = 61
[Thu Jan 21 17:52:41 2016] Extracting iterative edges from k = 61 to 71
[Thu Jan 21 17:52:41 2016] Building graph for k = 71
[Thu Jan 21 17:52:41 2016] Assembling contigs from SdBG for k = 71
[Thu Jan 21 17:52:42 2016] Extracting iterative edges from k = 71 to 81
[Thu Jan 21 17:52:42 2016] Building graph for k = 81
[Thu Jan 21 17:52:42 2016] Assembling contigs from SdBG for k = 81
[Thu Jan 21 17:52:42 2016] Extracting iterative edges from k = 81 to 91
[Thu Jan 21 17:52:42 2016] Building graph for k = 91
[Thu Jan 21 17:52:43 2016] Assembling contigs from SdBG for k = 91
[Thu Jan 21 17:52:43 2016] Extracting iterative edges from k = 91 to 99
[Thu Jan 21 17:52:43 2016] Building graph for k = 99
[Thu Jan 21 17:52:43 2016] Assembling contigs from SdBG for k = 99
[Thu Jan 21 17:52:44 2016] Merging to output final contigs.
[Thu Jan 21 17:52:44 2016] ALL DONE.

One point to note -- I was using a test instance with only 1 CPU so megahit wouldn't run even with this workaround until I resized my instance (I instead got a message saying Number of CPU threads should be at least 2!)

It may be useful to use a different box in the tutorial because I suspect other users may be using a similar low-powered test setup.

tomsitter commented 8 years ago

Any update on this? I didn't receive any output despite the biobox running. Thanks!

michaelbarton commented 8 years ago

Using the 0.3.0 release.

Number of CPU threads should be at least 2!)

I got the same error message. I'm using a macbook air using a "1.6 GHz Intel Core i5" which I believe has two cores. I think this error is could be due to boot2docker creating a virtual machine which may have less than the required number of cores. Either way including bioboxes/megahit in the tutorial might not be appropriate if this error might commonly occur. This is also likely the reason the verify command failed as this also fails for me too.

michaelbarton commented 8 years ago

@tomsitter please try changing:

"${PWD}/output:/bbx/mnt/output:rw"

To the following:

"${PWD}/output:/bbx/output:rw"

And see if this works.

michaelbarton commented 7 years ago

@tomsitter did you have a chance to rerun this with the command above?

@pbelmann I'm going to change the name of this issue so that it is more specific to problem that Tom appears to be describing.

tomsitter commented 7 years ago

@michaelbarton I was testing this for a short contract last year. No longer have the same environment set up for testing

pbelmann commented 7 years ago

@michaelbarton , @tomsitter I tried to reproduce the error but it did not appear. However since then there have been updates in megahit and the bioboxes commandline interface. I will close this issue. Please reopen it if you encounter it again.

michaelbarton commented 7 years ago

@pbelmann when you tried to reproduce the issue, were you using a single CPU?

pbelmann commented 7 years ago

I think I used the biobox cli --cpuset parameter. I will check that and report.

michaelbarton commented 7 years ago

Thanks Peter

pbelmann commented 7 years ago

I can confirm that it produces a fasta file with --cpuset="1" parameter.

michaelbarton commented 7 years ago

Thanks Peter, we can revisit this issue if it appears again.