DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
246 stars 73 forks source link

Error when executing centrifuge "Argument list too long" #258

Open BenjaminJPerry opened 1 year ago

BenjaminJPerry commented 1 year ago

Hi,

I am executing centrifuge in batch mode using snakemake on a slurm cluster, and am getting the following error message:

Can't exec "/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class": Argument list too long at /home/perrybe/.conda/envs/centrifuge/bin/centrifuge line 695. (ERR): Could not open Centrifuge pipe: '/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class --wrapper basic-0 -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB -t --threads 32 - -separator --passthrough -U results/02_kneaddata/FID274912.fastq, [...]

I have 9632 samples I am attempting to run in the batch, in the format results/02_kneaddata/FID274912.fastq.

I am using a dedicated conda environment to execute in:

(centrifuge) 09:07:33 mahuika01 /nesi/nobackup/agresearch03843/methane/RE-RRS-GTDB $ conda env export
name: centrifuge
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - ca-certificates=2023.5.7=hbcca054_0
  - centrifuge=1.0.4=hd03093a_0
  - gettext=0.21.1=h27087fc_0
  - libgcc-ng=12.2.0=h65d4601_19
  - libgomp=12.2.0=h65d4601_19
  - libiconv=1.17=h166bdaf_0
  - libidn2=2.3.4=h166bdaf_0
  - libnsl=2.0.0=h7f98852_0
  - libstdcxx-ng=12.2.0=h46fd767_19
  - libunistring=0.9.10=h7f98852_0
  - libzlib=1.2.13=h166bdaf_4
  - openssl=3.1.0=hd590300_3
  - perl=5.32.1=2_h7f98852_perl5
  - python=1.6=0
  - tar=1.34=hb2e2bae_1
  - wget=1.20.3=ha35d2d1_1
  - zlib=1.2.13=h166bdaf_4
prefix: /home/perrybe/.conda/envs/centrifuge

The unix system has the following configuration,

(centrifuge) 09:07:42 mahuika01 /nesi/nobackup/agresearch03843/methane/RE-RRS-GTDB $ cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

I've seen xargs mentioned in other issues with an "Argument list too long" error; however, to my reading none of them addressed this issue.

Thank you for your time, Ben

mourisl commented 1 year ago

What was your running command for Centrifuge?

BenjaminJPerry commented 1 year ago

I am using snakemake (7.26) to run the command in the following rule,

rule centrifugeGTDB:
    input:
        sampleSheet = "resources/centrifugeSampleSheet.tsv",
    output:
        out = expand("results/03_centrifuge/{sample}.GTDB.centrifuge", sample = FIDs),
        report = expand("results/03_centrifuge/{sample}.GTDB.centrifuge.report", sample = FIDs),
    log:
        "logs/centrifuge.GTDB.multi.log",
    benchmark:
        "benchmarks/centrifugeGTDB.txt"
    conda:
        "centrifuge"
    threads: 32
    resources:
        mem_gb = lambda wildcards, attempt: 160 + ((attempt - 1) * 20),
        time = lambda wildcards, attempt: 8640 + ((attempt - 1) * 1440),
        partition = "milan"
    shell:
        "centrifuge "
        "-x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB  "
        "--sample-sheet {input.sampleSheet} "
        "-t "
        "--threads {threads} "
        "2>&1 | tee {log}"

This translates into,

centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB  --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log

My centrifugeSampleSheet.tsv file looks like this (I haven't had any issues with these in prior smaller batches):

$ head resources/centrifugeSampleSheet.tsv 
1       results/02_kneaddata/FID274912.fastq    NA      results/03_centrifuge/FID274912.fastq.GTDB.centrifuge   results/03_centrifuge/FID274912.fastq.GTDB.centrifuge.report
1       results/02_kneaddata/FID274915.fastq    NA      results/03_centrifuge/FID274915.fastq.GTDB.centrifuge   results/03_centrifuge/FID274915.fastq.GTDB.centrifuge.report
1       results/02_kneaddata/FID274917.fastq    NA      results/03_centrifuge/FID274917.fastq.GTDB.centrifuge   results/03_centrifuge/FID274917.fastq.GTDB.centrifuge.report
1       results/02_kneaddata/FID274918.fastq    NA      results/03_centrifuge/FID274918.fastq.GTDB.centrifuge   results/03_centrifuge/FID274918.fastq.GTDB.centrifuge.report
1       results/02_kneaddata/FID274920.fastq    NA      results/03_centrifuge/FID274920.fastq.GTDB.centrifuge   results/03_centrifuge/FID274920.fastq.GTDB.centrifuge.report
1       results/02_kneaddata/FID274921.fastq    NA      results/03_centrifuge/FID274921.fastq.GTDB.centrifuge   results/03_centrifuge/FID274921.fastq.GTDB.centrifuge.report
1       results/02_kneaddata/FID274924.fastq    NA      results/03_centrifuge/FID274924.fastq.GTDB.centrifuge   results/03_centrifuge/FID274924.fastq.GTDB.centrifuge.report
1       results/02_kneaddata/FID274925.fastq    NA      results/03_centrifuge/FID274925.fastq.GTDB.centrifuge   results/03_centrifuge/FID274925.fastq.GTDB.centrifuge.report
1       results/02_kneaddata/FID274927.fastq    NA      results/03_centrifuge/FID274927.fastq.GTDB.centrifuge   results/03_centrifuge/FID274927.fastq.GTDB.centrifuge.report
1       results/02_kneaddata/FID274929.fastq    NA      results/03_centrifuge/FID274929.fastq.GTDB.centrifuge   results/03_centrifuge/FID274929.fastq.GTDB.centrifuge.report

In my workflow I have steps prior to this which aggregate a larger number of files into asingle command line parameter for input and did not have this error (seqkit stats).

Here is the module which loads snakemake on the HPC I am using,

$ module show snakemake
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /opt/nesi/CS400_centos7_bdw/modules/all/snakemake/7.26.0-gimkl-2022a-Python-3.11.3.lua:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
help([[
Description
===========
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses.

More information
================
 - Homepage: https://snakemake.readthedocs.io

Included extensions
===================
appdirs-1.4.3, ConfigArgParse-0.13.0, datrie-0.8.2, docutils-0.20.1,
dpath-2.1.6, filelock-3.2.0, gitdb2-2.0.4, GitPython-2.1.11,
humanfriendly-10.0, jsonschema-2.6.0, plac-1.3.5, PuLP-2.5.0, reretry-0.11.8,
retry-0.9.2, smart_open-5.2.1, smmap2-2.0.4, snakemake-7.26.0,
throttler-1.2.2, throttler-1.2.2, toposort-1.6, wrapt-1.15.0, yte-1.5.1
]])
whatis("Description: The Snakemake workflow management system is a tool to create reproducible and scalable data analyses.")
whatis("Homepage: https://snakemake.readthedocs.io")
whatis("URL: https://snakemake.readthedocs.io")
whatis("Extensions: appdirs-1.4.3, ConfigArgParse-0.13.0, datrie-0.8.2, docutils-0.20.1, dpath-2.1.6, filelock-3.2.0, gitdb2-2.0.4, GitPython-2.1.11, humanfriendly-10.0, jsonschema-2.6.0, plac-1.3.5, PuLP-2.5.0, reretry-0.11.8, retry-0.9.2, smart_open-5.2.1, smmap2-2.0.4, snakemake-7.26.0, throttler-1.2.2, throttler-1.2.2, toposort-1.6, wrapt-1.15.0, yte-1.5.1")
conflict("snakemake")
prepend_path("CMAKE_PREFIX_PATH","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3")
prepend_path("LIBRARY_PATH","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3/lib")
prepend_path("PATH","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3/bin")
setenv("EBROOTSNAKEMAKE","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3")
setenv("EBVERSIONSNAKEMAKE","7.26.0")
setenv("EBDEVELSNAKEMAKE","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3/easybuild/snakemake-7.26.0-gimkl-2022a-Python-3.11.3-easybuild-devel")
prepend_path("PYTHONPATH","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3/lib/python3.11/site-packages")
setenv("EBEXTSLISTSNAKEMAKE","smart_open-5.2.1,filelock-3.2.0,PuLP-2.5.0,toposort-1.6,smmap2-2.0.4,gitdb2-2.0.4,GitPython-2.1.11,docutils-0.20.1,jsonschema-2.6.0,datrie-0.8.2,appdirs-1.4.3,ConfigArgParse-0.13.0,throttler-1.2.2,wrapt-1.15.0,retry-0.9.2,reretry-0.11.8,throttler-1.2.2,dpath-2.1.6,plac-1.3.5,yte-1.5.1,humanfriendly-10.0,snakemake-7.26.0")
BenjaminJPerry commented 1 year ago

Update.

I have been able to repeat the error. launching the workflow via a small sbatch script, as before.

#!/bin/bash

#SBATCH --job-name=LaunchSMK
#SBATCH --account=agresearch03843
#SBATCH --time=06-00:00:00
#SBATCH --mem=2G
#SBATCH --partition=milan
#SBATCH --output=%j_output.out
#SBATCH --mail-user=ben.perry@agresearch.co.nz
#SBATCH --mail-type=ALL
#SBATCH --cpus-per-task=2

module load snakemake

cd /nesi/nobackup/agresearch03843/methane/RE-RRS-GTDB

pwd

snakemake --profile config/slurm --snakefile workflow/profile.smk centrifugeGTDB

The error message appears,

Can't exec "/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class": Argument list too long at /home/perrybe/.conda/envs/centrifuge/bin/centrifuge line 695.
(ERR): Could not open Centrifuge pipe: '/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class --wrapper basic-0 -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB -t --threads 32 --separator --passthrough -U r
esults/02_kneaddata/FID274912.fastq,results/02_kneaddata/FID274915.fastq,results/02_kneaddata/FID274917.fastq,results/02_kneaddata/FID274918.fastq,results/02_kneaddata/FID274920.fastq,results/02_kneaddata/FID274921.fastq,re
sults/02_kneaddata/FID274924.fastq,results/02_kneaddata/FID274925.fastq,results/02_kneaddata/FID274927.fastq,results/02_kneaddata/FID274929.fastq,results/02_kneaddata/FID274932.fastq,results/02_kneaddata/FID274934.fastq,res
ults/02_kneaddata/FID274936.fastq,results/02_kneaddata/FID274938.fastq,results/02_kneaddata/FID274939.fastq,[...]

However, when I run the snakemake centrifugeGTDB rule interactively on the CLI (as below) I do not get the error message, and centrifuge appears to be running correctly.

snakemake --profile config/slurm --snakefile workflow/profile.smk centrifugeGTDB

It is unclear where this behavior is arising from, but my suspicion is that -- because centrifuge runs without error when launched interactively via snakemake -- it is not related to centrifuge.

BenjaminJPerry commented 1 year ago

The error seems to still be happening, even when launched interactively. It just takes longer to happen, could it be happening after the index loads?

$ cat -n 2023-05-26T175036.845717.snakemake.log | grep -v "GTDB.centrifuge"
     1  Building DAG of jobs...
     2  Using shell: /usr/bin/bash
     3  Provided cluster nodes: 100
     4  Job stats:
     5  job               count    min threads    max threads
     6  --------------  -------  -------------  -------------
     7  centrifugeGTDB        1             32             32
     8  total                 1             32             32
     9
    10  Select jobs to execute...
    11
    12  [Fri May 26 17:51:33 2023]
    13  rule centrifugeGTDB:
    14      input: resources/centrifugeSampleSheet.tsv
    16      log: logs/centrifuge.GTDB.multi.log
    17      jobid: 0
    18      benchmark: benchmarks/centrifugeGTDB.txt
    20      threads: 32
    21      resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, account=agresearch03843, partition=milan, time=05-00:00:00, mem_gb=160
    22
    23  Submitted job 0 with external jobid 'Submitted batch job 36081670'.
    24  [Fri May 26 18:02:02 2023]
    25  Error in rule centrifugeGTDB:
    26      jobid: 0
    27      input: resources/centrifugeSampleSheet.tsv
    29      log: logs/centrifuge.GTDB.multi.log (check log file(s) for error details)
    30      conda-env: centrifuge
    31      shell:
    32          centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log
    33          (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    34      cluster_jobid: Submitted batch job 36081670
    35
    36  Error executing rule centrifugeGTDB on cluster (jobid: 0, external: Submitted batch job 36081670, jobscript: /scale_wlg_nobackup/filesets/nobackup/agresearch03843/methane/RE-RRS-GTDB/.snakemake/tmp.c4_kxd76/snakejob.centrifugeGTDB.0.sh). For error details see the cluster log and the log files of the involved rule(s).
    37  Trying to restart job 0.
    38  Select jobs to execute...
    39
    40  [Fri May 26 18:02:10 2023]
    41  rule centrifugeGTDB:
    42      input: resources/centrifugeSampleSheet.tsv
    44      log: logs/centrifuge.GTDB.multi.log
    45      jobid: 0
    46      benchmark: benchmarks/centrifugeGTDB.txt
    48      threads: 32
    49      resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, account=agresearch03843, partition=milan, time=05-00:00:00, mem_gb=180
    50
    51  Submitted job 0 with external jobid 'Submitted batch job 36082140'.
    52  [Sat May 27 08:09:25 2023]
    53  Error in rule centrifugeGTDB:
    54      jobid: 0
    55      input: resources/centrifugeSampleSheet.tsv
    57      log: logs/centrifuge.GTDB.multi.log (check log file(s) for error details)
    58      conda-env: centrifuge
    59      shell:
    60          centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log
    61          (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    62      cluster_jobid: Submitted batch job 36082140
    63
    64  Error executing rule centrifugeGTDB on cluster (jobid: 0, external: Submitted batch job 36082140, jobscript: /scale_wlg_nobackup/filesets/nobackup/agresearch03843/methane/RE-RRS-GTDB/.snakemake/tmp.c4_kxd76/snakejob.centrifugeGTDB.0.sh). For error details see the cluster log and the log files of the involved rule(s).

From the logs, Job ID - 36081670

$ cat -n logs/centrifugeGTDB/centrifugeGTDB--36081670.out | grep -v ".fastq,results/02_kneaddata\|GTDB.centrifuge"
     1  Building DAG of jobs...
     2  Using shell: /usr/bin/bash
     3  Provided cores: 32
     4  Rules claiming more threads will be scaled down.
     5  Provided resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, mem_gb=160
     6  Select jobs to execute...
     7
     8  [Fri May 26 17:53:24 2023]
     9  rule centrifugeGTDB:
    10      input: resources/centrifugeSampleSheet.tsv
    12      log: logs/centrifuge.GTDB.multi.log
    13      jobid: 0
    14      benchmark: benchmarks/centrifugeGTDB.txt
    16      threads: 32
    17      resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/dev/shm/jobs/36081670, account=agresearch03843, partition=milan, time=05-00:00:00, mem_gb=160
    18
    19  Activating conda environment: centrifuge
    20  Can't exec "/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class": Argument list too long at /home/perrybe/.conda/envs/centrifuge/bin/centrifuge line 695.
    22  Exiting now ...
    23  [Fri May 26 18:01:53 2023]
    24  Error in rule centrifugeGTDB:
    25      jobid: 0
    26      input: resources/centrifugeSampleSheet.tsv
    28      log: logs/centrifuge.GTDB.multi.log (check log file(s) for error details)
    29      conda-env: centrifuge
    30      shell:
    31          centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log
    32          (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    33
    34  Shutting down, this might take some time.
    35  Exiting because a job execution failed. Look above for error message

Job ID - 36082140

$ cat -n logs/centrifugeGTDB/centrifugeGTDB--36082140.out | grep -v ".fastq,results/02_kneaddata\|GTDB.centrifuge"
     1  Building DAG of jobs...
     2  Using shell: /usr/bin/bash
     3  Provided cores: 32
     4  Rules claiming more threads will be scaled down.
     5  Provided resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, mem_gb=180
     6  Select jobs to execute...
     7
     8  [Sat May 27 07:59:00 2023]
     9  rule centrifugeGTDB:
    10      input: resources/centrifugeSampleSheet.tsv
    12      log: logs/centrifuge.GTDB.multi.log
    13      jobid: 0
    14      benchmark: benchmarks/centrifugeGTDB.txt
    16      threads: 32
    17      resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/dev/shm/jobs/36082140, account=agresearch03843, partition=milan, time=05-00:00:00, mem_gb=180
    18
    19  Activating conda environment: centrifuge
    20  Can't exec "/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class": Argument list too long at /home/perrybe/.conda/envs/centrifuge/bin/centrifuge line 695.
    22  Exiting now ...
    23  [Sat May 27 08:07:04 2023]
    24  Error in rule centrifugeGTDB:
    25      jobid: 0
    26      input: resources/centrifugeSampleSheet.tsv
    28      log: logs/centrifuge.GTDB.multi.log (check log file(s) for error details)
    29      conda-env: centrifuge
    30      shell:
    31          centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log
    32          (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    33
    34  Shutting down, this might take some time.
    35  Exiting because a job execution failed. Look above for error message

I will try to batch the job into smaller groups on Monday to see if the error persists.