alesssia / YAMP

YAMP: Yet Another Metagenomic Pipeline
GNU General Public License v3.0
56 stars 28 forks source link

docker run fail for demo data #7

Closed wangdatou2009 closed 6 years ago

wangdatou2009 commented 6 years ago

Hi Alessia, I tried to use docker and run locally in linux 16.04, I tried steps like below, and what's wrong with the error ? the docker image was pulled completely, but the system still unable to find image 'yampdocker:latest' locally

  1. No tools install, and no resource downloaded
  2. git clone https://github.com/alesssia/YAMP.git
  3. install nextflow
  4. download ERR011089_1.fastq.gz and ERR011089_2.fastq.gz
  5. docker pull alesssia/yampdocker
  6. annotate executor and queue in nextflow.config
  7. nextflow run YAMP.nf --reads1 ./data/ERR011089_1.fastq.gz --reads2 ./data/ERR011089_2.fastq.gz --prefix Meta_HIT_ERR011089 --outdir ./data --mode complete -with-docker yampdocker
  8. Error N E X T F L O W ~ version 0.28.0 Launching YAMP.nf [angry_shockley] - revision: 8ed2c9d795 [warm up] executor > local [ff/30bff9] Submitted process > dedup [78/3e0203] Submitted process > qualityAssessment (1) [51/e88aa9] Submitted process > qualityAssessment (2) ERROR ~ Error executing process > 'dedup'

Caused by: Process dedup terminated with an error exit status (125)

Command executed:

Measures execution time

sysdate=$(date) starttime=$(date +%s.%N) echo "Performing Quality Control. STEP 1 [De-duplication] at $sysdate" > .log.2 echo " " >> .log.2

Sets the maximum memory to the value requested in the config file

maxmem=$(echo "32 GB" | sed 's/ //g' | sed 's/B//g')

Defines command for de-duplication

if [ "paired" = "paired" ]; then CMD="clumpify.sh -Xmx"$maxmem" in1=ERR011089_1.fastq.gz in2=ERR011089_2.fastq.gz out1=Meta_HIT_ERR011089_dedupe_R1.fq.gz out2=Meta_HIT_ERR011089_dedupe_R2.fq.gz qin=33 dedupe subs=0 threads=4" else CMD="clumpify.sh -Xmx"$maxmem" in=ERR011089_1.fastq.gz out=Meta_HIT_ERR011089_dedupe.fq.gz qin=33 dedupe subs=0 threads=4" fi

Logs version of the software and executed command (BBmap prints on stderr)

version=$(clumpify.sh --version 2>&1 >/dev/null | grep "BBMap version") echo "Using clumpify.sh in $version " >> .log.2 echo "Executing command: $CMD " >> .log.2 echo " " >> .log.2

De-duplicates

exec $CMD 2>&1 | tee tmp.log

Logs some figures about sequences passing de-duplication

echo "Clumpify's de-duplication stats: " >> .log.2 echo " " >> .log.2 sed -n '/Reads In:/,/Duplicates Found:/p' tmp.log >> .log.2 echo " " >> .log.2 totR=$(grep "Reads In:" tmp.log | cut -f 1 | cut -d: -f 2 | sed 's/ //g') remR=$(grep "Duplicates Found:" tmp.log | cut -f 1 | cut -d: -f 2 | sed 's/ //g') survivedR=$(($totR-$remR)) percentage=$(echo $survivedR $totR | awk '{print $1/$2*100}' ) echo "$survivedR out of $totR paired reads survived de-duplication ($percentage%, $remR reads removed)" >> .log.2 echo " " >> .log.2

Measures and logs execution time

endtime=$(date +%s.%N) exectime=$(echo "$endtime $starttime" | awk '{print $1-$2}') sysdate=$(date) echo "STEP 1 (Quality control) terminated at $sysdate ($exectime seconds)" >> .log.2 echo " " >> .log.2 echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" >> .log.2 echo " " >> .log.2

Command exit status: 125

Command output: (empty)

Command error: Unable to find image 'yampdocker:latest' locally docker: Error response from daemon: repository yampdocker not found: does not exist or no pull access. See 'docker run --help'.

Work dir: .....................YAMP/work/ff/30bff9a2c9c06f891f279d055ca0b3

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details WARN: Killing pending tasks (2)

wangdatou2009 commented 6 years ago

And if I tried nextflow run YAMP.nf --reads1 ./data/ERR011089_1.fastq.gz --reads2 ./data/ERR011089_2.fastq.gz --prefix Meta_HIT_ERR011089 --outdir ./data --mode complete -with-docker docker://alesssia/yampdocker

basically the same

N E X T F L O W ~ version 0.28.0 Launching YAMP.nf [cocky_lovelace] - revision: 8ed2c9d795 [warm up] executor > local [5d/aef143] Submitted process > dedup [6c/5db1ee] Submitted process > qualityAssessment (1) [51/90f83d] Submitted process > qualityAssessment (2) ERROR ~ Error executing process > 'dedup'

Caused by: Process dedup terminated with an error exit status (125)

Command executed:

Measures execution time

sysdate=$(date) starttime=$(date +%s.%N) echo "Performing Quality Control. STEP 1 [De-duplication] at $sysdate" > .log.2 echo " " >> .log.2

Sets the maximum memory to the value requested in the config file

maxmem=$(echo "32 GB" | sed 's/ //g' | sed 's/B//g')

Defines command for de-duplication

if [ "paired" = "paired" ]; then CMD="clumpify.sh -Xmx"$maxmem" in1=ERR011089_1.fastq.gz in2=ERR011089_2.fastq.gz out1=Meta_HIT_ERR011089_dedupe_R1.fq.gz out2=Meta_HIT_ERR011089_dedupe_R2.fq.gz qin=33 dedupe subs=0 threads=4" else CMD="clumpify.sh -Xmx"$maxmem" in=ERR011089_1.fastq.gz out=Meta_HIT_ERR011089_dedupe.fq.gz qin=33 dedupe subs=0 threads=4" fi

Logs version of the software and executed command (BBmap prints on stderr)

version=$(clumpify.sh --version 2>&1 >/dev/null | grep "BBMap version") echo "Using clumpify.sh in $version " >> .log.2 echo "Executing command: $CMD " >> .log.2 echo " " >> .log.2

De-duplicates

exec $CMD 2>&1 | tee tmp.log

Logs some figures about sequences passing de-duplication

echo "Clumpify's de-duplication stats: " >> .log.2 echo " " >> .log.2 sed -n '/Reads In:/,/Duplicates Found:/p' tmp.log >> .log.2 echo " " >> .log.2 totR=$(grep "Reads In:" tmp.log | cut -f 1 | cut -d: -f 2 | sed 's/ //g') remR=$(grep "Duplicates Found:" tmp.log | cut -f 1 | cut -d: -f 2 | sed 's/ //g') survivedR=$(($totR-$remR)) percentage=$(echo $survivedR $totR | awk '{print $1/$2*100}' ) echo "$survivedR out of $totR paired reads survived de-duplication ($percentage%, $remR reads removed)" >> .log.2 echo " " >> .log.2

Measures and logs execution time

endtime=$(date +%s.%N) exectime=$(echo "$endtime $starttime" | awk '{print $1-$2}') sysdate=$(date) echo "STEP 1 (Quality control) terminated at $sysdate ($exectime seconds)" >> .log.2 echo " " >> .log.2 echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" >> .log.2 echo " " >> .log.2

Command exit status: 125

Command output: (empty)

Command error: docker: Error parsing reference: "docker://alesssia/yampdocker" is not a valid repository/tag: invalid reference format. See 'docker run --help'.

Work dir: ......................YAMP/work/5d/aef143354299874734a019ccc6a418

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option -resume

-- Check '.nextflow.log' file for details WARN: Killing pending tasks (1)

wangdatou2009 commented 6 years ago

Then tried that nextflow run YAMP.nf --reads1 ./data/ERR011089_1.fastq.gz --reads2 ./data/ERR011089_2.fastq.gz --prefix Meta_HIT_ERR011089 --outdir ./data --mode QC -with-docker alesssia/yampdocker

the error goes like: N E X T F L O W ~ version 0.28.0 Launching YAMP.nf [suspicious_hodgkin] - revision: 8ed2c9d795 [warm up] executor > local [3d/e2c7b0] Submitted process > dedup [00/2cf546] Submitted process > qualityAssessment (2) [0d/e89764] Submitted process > qualityAssessment (1) ERROR ~ Error executing process > 'qualityAssessment (2)'

Caused by: Process qualityAssessment (2) terminated with an error exit status (1)

Command executed:

Measures execution time

sysdate=$(date) starttime=$(date +%s.%N) echo "Performing Quality Control. [Assessment of read quality] at $sysdate" > .log.1_R2 echo "File being analysed: ERR011089_2.fastq.gz" >> .log.1_R2 echo " " >> .log.1_R2

Logs version of the software and executed command

version=$(fastqc --version) CMD="fastqc --quiet --noextract --format fastq --outdir=. --threads 4 ERR011089_2.fastq.gz"

echo "Using $version " >> .log.1_R2 echo "Executing command $CMD " >> .log.1_R2 echo " " >> .log.1_R2

Does QC, extracts relevant information, and removes temporary files

bash fastQC.sh ERR011089_2.fastq.gz Meta_HIT_ERR011089_rawreads_R2 4 ERR011089_2.fastq.gz

Logging QC statistics (number of sequences, Pass/warning/fail, basic statistics, duplication level, kmers)

base=$(basename ERR011089_2.fastq.gz) bash logQC.sh $base Meta_HIT_ERR011089_rawreads_R2_fastqc_data.txt .log.1_R2

Measures and log execution time

endtime=$(date +%s.%N) exectime=$(echo "$endtime $starttime" | awk '{print $1-$2}') sysdate=$(date) echo "Quality assessment on ERR011089_2.fastq.gz terminated at $sysdate ($exectime seconds)" >> .log.1_R2 echo " " >> .log.1_R2 echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" >> .log.1_R2 echo " " >> .log.1_R2

Command exit status: 1

Command output: (empty)

Command error: WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. touch: cannot touch ‘.command.trace’: Permission denied

Work dir: .......................YAMP/work/00/2cf54638379a3c17505901c72d838c

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details WARN: Killing pending tasks (1)

wangdatou2009 commented 6 years ago

fixed add docker.runOptions = '-u $(id -u):$(id -g)' in nextflow.config

alesssia commented 6 years ago

Thanks for letting me know. I will add this to the troubleshooting page. Hope everything is all right now?

wangdatou2009 commented 6 years ago

Thanks for your reply. The only left issue is the folder structure for uniref90 did not match the one in config. And another question is even for docker user, we still to arrange the resource folder by myself? the image you created do not include any resource, right?

alesssia commented 6 years ago

I will have a look at the config file.

Correct. The image does not contain any resources but you can download them either from Zenodo (https://zenodo.org/record/1068229#.Wh7a3rTQqL4), or by using the following command:

wget https://zenodo.org/record/1068229/files/YAMP_resources_20171128.tar.gz

If you use this data file, please note that, before running YAMP, the FASTA file describing the human (contaminating) genome should be indexed with the following command:

bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz

Hope this helps!

alesssia commented 6 years ago

Added Troubleshooting here: https://github.com/alesssia/YAMP/wiki/Troubleshooting and here: https://github.com/alesssia/YAMP/wiki/How-to-use-Docker. Thanks again!