deng-lab / viroprofiler

A containerized bioinformatics pipeline for viral metagenomic data analysis
https://deng-lab.github.io/viroprofiler
MIT License
23 stars 12 forks source link

Process `VIROPROFILER:SETUP:DB_CHECKV` terminated with an error exit status (1) #21

Open Aciole-David opened 6 months ago

Aciole-David commented 6 months ago

Description of the bug

Error:

checkv from setup ends with exist status 1; --checkv-db-v1.5.tar.gz file downloads around ~1.1 Gb (official file from https://portal.nersc.gov/CheckV/ is ~1.7 Gb) ----process continue to extract ------extraction ends with error EOFError: Compressed file ended before the end-of-stream marker was reached

--------retry several times, same result

Solved (?):

--Manually download https://portal.nersc.gov/CheckV/checkv-db-v1.5.tar.gz wget https://portal.nersc.gov/CheckV/checkv-db-v1.5.tar.gz ----Manually extract and move content to ~/viroprofiler/checkv/ ----tar -xzvf checkv-db-v1.5.tar.gz ------Resume pipeline run ------nextflow run deng-lab/viroprofiler -r main -profile docker --mode "setup" --max_cpus 12 --max_memory 16.GB --max_time 100.h -resume

executor > local (1) [ee/e26b48] process > VIROPROFILER:SETUP:DB_CHECKV [100%] 1 of 1 ✔ [6f/71ca91] process > VIROPROFILER:SETUP:DB_VIBRANT [100%] 1 of 1, cached: 1 ✔ [c3/581629] process > VIROPROFILER:SETUP:DB_VIRSORTER2 [100%] 1 of 1, cached: 1 ✔ [81/d419c1] process > VIROPROFILER:SETUP:DB_DRAM [100%] 1 of 1, cached: 1 ✔ [9f/b85c6a] process > VIROPROFILER:SETUP:DB_IPHOP [100%] 1 of 1, cached: 1 ✔ [01/8e1c61] process > VIROPROFILER:SETUP:DB_VREFSEQ [100%] 1 of 1, cached: 1 ✔ -[ViroProfiler] Pipeline completed successfully-

Command used and terminal output

nextflow run deng-lab/viroprofiler -r main -profile docker --mode "setup" --max_cpus 12 --max_memory 16.GB --max_time 100.h -resume
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/deng-lab/viroprofiler` [mzen_wilson] DSL2 - revision: c2a1f1871b [main]

----------------------------------------------------------------------------------------------------------------
                                               __                                                                
 oooooo     oooo  o8o                        88  88                        .o88o.  o8o  oooo                     
  `888.     .8'   `"'                       88 ss 88                       888 `"  `"'  `888                     
   `888.   .8'   oooo  oooo d8b  .ooooo.     88__88    oooo d8b  .ooooo.  o888oo  oooo   888   .ooooo.  oooo d8b 
    `888. .8'    `888  `888""8P d88' `88b      ||      `888""8P d88' `88b  888    `888   888  d88' `88b `888""8P 
     `888.8'      888   888     888   888     _||_      888     888   888  888     888   888  888ooo888  888     
      `888'       888   888     888   888   // || \\    888     888   888  888     888   888  888    .o  888     
       `8'       o888o d888b    `Y8bod8P'  //      \\  d888b    `Y8bod8P' o888o   o888o o888o `Y8bod8P' d888b    
  ViroProfiler v0.2.4
----------------------------------------------------------------------------------------------------------------
Core Nextflow options
  revision       : main
  runName        : mzen_wilson
  containerEngine: docker
  container      : [withLabel:viroprofiler_base:denglab/viroprofiler-base:v0.2, withLabel:viroprofiler_abundance:denglab/viroprofiler-abundance:v0.2, withLabel:viroprofiler_bracken:denglab/viroprofiler-bracken:v0.2, withLabel:viroprofiler_vibrant:denglab/viroprofiler-vibrant:v0.2, withLabel:viroprofiler_binning:denglab/viroprofiler-binning:v0.2, withLabel:viroprofiler_geneannot:denglab/viroprofiler-geneannot:v0.2, withLabel:viroprofiler_host:denglab/viroprofiler-host:v0.1, withLabel:viroprofiler_replicyc:denglab/viroprofiler-replicyc:v0.1, withLabel:viroprofiler_taxa:denglab/viroprofiler-taxa:v0.1, withLabel:viroprofiler_virsorter2:denglab/viroprofiler-virsorter2:v0.2.5, withLabel:viroprofiler_vpfkit:denglab/viroprofiler-viewer]
  launchDir      : /home/workstation_8/metagenomics/viroprofiler
  workDir        : /home/workstation_8/metagenomics/viroprofiler/work
  projectDir     : /home/workstation_8/.nextflow/assets/deng-lab/viroprofiler
  userName       : workstation_8
  profile        : docker
  configFiles    : /home/workstation_8/.nextflow/assets/deng-lab/viroprofiler/nextflow.config

Input/output options
  mode           : setup
  db             : /home/workstation_8/viroprofiler
  outdir         : output

QC
  contamref_idx  : /home/workstation_8/viroprofiler/contamination_refs/hg19/ref

Contig library parameters
  assemblies     : scaffolds

Others
  use_iphop      : true
  use_dram       : true

Max job request options
  max_cpus       : 12
  max_memory     : 16.GB

!! Only displaying parameters that differ from the pipeline defaults !!
----------------------------------------------------------------------------------------------------------------
If you use ViroProfiler for your analysis please cite:

* The ViroProfiler pipeline
 Ru, Jinlong, et al. "ViroProfiler: a containerized bioinformatics pipeline for viral metagenomic data analysis."
 Gut Microbes 15.1 (2023): 2192522. https://doi.org/10.1080/19490976.2023.2192522

* The nf-core framework
 Ewels, Philip A., et al. "The nf-core framework for community-curated bioinformatics pipelines."
 Nature biotechnology 38.3 (2020): 276-278. https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/deng-lab/viroprofiler/blob/main/CITATIONS.md
----------------------------------------------------------------------------------------------------------------
executor >  local (1)
[ee/e26b48] process > VIROPROFILER:SETUP:DB_CHECKV     [  0%] 0 of 1
[6f/71ca91] process > VIROPROFILER:SETUP:DB_VIBRANT    [100%] 1 of 1, cached: 1 ✔
[c3/581629] process > VIROPROFILER:SETUP:DB_VIRSORTER2 [100%] 1 of 1, cached: 1 ✔
[81/d419c1] process > VIROPROFILER:SETUP:DB_DRAM       [100%] 1 of 1, cached: 1 ✔
[9f/b85c6a] process > VIROPROFILER:SETUP:DB_IPHOP      [100%] 1 of 1, cached: 1 ✔
[01/8e1c61] process > VIROPROFILER:SETUP:DB_VREFSEQ    [100%] 1 of 1, cached: 1 ✔
ERROR ~ Error executing process > 'VIROPROFILER:SETUP:DB_CHECKV'

Caused by:
  Process `VIROPROFILER:SETUP:DB_CHECKV` terminated with an error exit status (1)

Command executed:

  if [ ! -d /home/workstation_8/viroprofiler/checkv ]; then
      checkv download_database /home/workstation_8/viroprofiler
      mv /home/workstation_8/viroprofiler/checkv-db-v* /home/workstation_8/viroprofiler/checkv
  else
      echo "CheckV database already exists"
  fi

Command exit status:
  1

Command output:
  (empty)

Command error:

  CheckV v1.0.1: download_database
  [1/4] Checking latest version of CheckV's database...
  [2/4] Downloading 'checkv-db-v1.5'...
  [3/4] Extracting 'checkv-db-v1.5'...
executor >  local (1)
[ee/e26b48] process > VIROPROFILER:SETUP:DB_CHECKV     [100%] 1 of 1, failed: 1 ✘
[6f/71ca91] process > VIROPROFILER:SETUP:DB_VIBRANT    [100%] 1 of 1, cached: 1 ✔
[c3/581629] process > VIROPROFILER:SETUP:DB_VIRSORTER2 [100%] 1 of 1, cached: 1 ✔
[81/d419c1] process > VIROPROFILER:SETUP:DB_DRAM       [100%] 1 of 1, cached: 1 ✔
[9f/b85c6a] process > VIROPROFILER:SETUP:DB_IPHOP      [100%] 1 of 1, cached: 1 ✔
[01/8e1c61] process > VIROPROFILER:SETUP:DB_VREFSEQ    [100%] 1 of 1, cached: 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[ViroProfiler] Pipeline completed with errors-
ERROR ~ Error executing process > 'VIROPROFILER:SETUP:DB_CHECKV'

Caused by:
  Process `VIROPROFILER:SETUP:DB_CHECKV` terminated with an error exit status (1)

Command executed:

  if [ ! -d /home/workstation_8/viroprofiler/checkv ]; then
      checkv download_database /home/workstation_8/viroprofiler
      mv /home/workstation_8/viroprofiler/checkv-db-v* /home/workstation_8/viroprofiler/checkv
  else
      echo "CheckV database already exists"
  fi

Command exit status:
  1

Command output:
  (empty)

Command error:

  CheckV v1.0.1: download_database
  [1/4] Checking latest version of CheckV's database...
  [2/4] Downloading 'checkv-db-v1.5'...
  [3/4] Extracting 'checkv-db-v1.5'...
executor >  local (1)
[ee/e26b48] process > VIROPROFILER:SETUP:DB_CHECKV     [100%] 1 of 1, failed: 1 ✘
[6f/71ca91] process > VIROPROFILER:SETUP:DB_VIBRANT    [100%] 1 of 1, cached: 1 ✔
[c3/581629] process > VIROPROFILER:SETUP:DB_VIRSORTER2 [100%] 1 of 1, cached: 1 ✔
[81/d419c1] process > VIROPROFILER:SETUP:DB_DRAM       [100%] 1 of 1, cached: 1 ✔
[9f/b85c6a] process > VIROPROFILER:SETUP:DB_IPHOP      [100%] 1 of 1, cached: 1 ✔
[01/8e1c61] process > VIROPROFILER:SETUP:DB_VREFSEQ    [100%] 1 of 1, cached: 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[ViroProfiler] Pipeline completed with errors-
ERROR ~ Error executing process > 'VIROPROFILER:SETUP:DB_CHECKV'

Caused by:
  Process `VIROPROFILER:SETUP:DB_CHECKV` terminated with an error exit status (1)

Command executed:

  if [ ! -d /home/workstation_8/viroprofiler/checkv ]; then
      checkv download_database /home/workstation_8/viroprofiler
      mv /home/workstation_8/viroprofiler/checkv-db-v* /home/workstation_8/viroprofiler/checkv
  else
      echo "CheckV database already exists"
  fi

Command exit status:
  1

Command output:
  (empty)

Command error:

  CheckV v1.0.1: download_database
  [1/4] Checking latest version of CheckV's database...
  [2/4] Downloading 'checkv-db-v1.5'...
  [3/4] Extracting 'checkv-db-v1.5'...
  Traceback (most recent call last):
    File "/opt/conda/envs/viroprofiler-checkv/bin/checkv", line 10, in <module>
      sys.exit(cli())
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/site-packages/checkv/cli.py", line 117, in cli
      args["func"](args)
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/site-packages/checkv/modules/download_database.py", line 83, in main
      db.extract()
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/site-packages/checkv/modules/download_database.py", line 31, in extract
      shutil.unpack_archive(self.output_file, self.destination, "gztar")
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/shutil.py", line 1298, in unpack_archive
      func(filename, extract_dir, **dict(format_info[2]))
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/shutil.py", line 1235, in _unpack_tarfile
      tarobj.extractall(extract_dir)
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/tarfile.py", line 2059, in extractall
      self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/tarfile.py", line 2100, in extract
      self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/tarfile.py", line 2173, in _extract_member
      self.makefile(tarinfo, targetpath)
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/tarfile.py", line 2222, in makefile
      copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/tarfile.py", line 248, in copyfileobj
      buf = src.read(bufsize)
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/gzip.py", line 301, in read
      return self._buffer.read(size)
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/_compression.py", line 68, in readinto
      data = self.read(len(byte_view))
    File "/opt/conda/envs/viroprofiler-checkv/lib/python3.10/gzip.py", line 507, in read
      raise EOFError("Compressed file ended before the "
  EOFError: Compressed file ended before the end-of-stream marker was reached

Work dir:
  /home/workstation_8/metagenomics/viroprofiler/work/ee/e26b489e2378ddded652d679ba6589

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

nextflow.log.zip

System information

nextflow version 23.10.1.5891 Desktop Workstation (amd ryzen 7 5700x 8-core 16-thread; 20Gb DDR4) Local Docker Ubuntu MATE 22.04.4 LTS x86_64 ViroProfiler v0.2.4