Open BenjaminJPerry opened 1 year ago
What was your running command for Centrifuge?
I am using snakemake (7.26) to run the command in the following rule,
rule centrifugeGTDB:
input:
sampleSheet = "resources/centrifugeSampleSheet.tsv",
output:
out = expand("results/03_centrifuge/{sample}.GTDB.centrifuge", sample = FIDs),
report = expand("results/03_centrifuge/{sample}.GTDB.centrifuge.report", sample = FIDs),
log:
"logs/centrifuge.GTDB.multi.log",
benchmark:
"benchmarks/centrifugeGTDB.txt"
conda:
"centrifuge"
threads: 32
resources:
mem_gb = lambda wildcards, attempt: 160 + ((attempt - 1) * 20),
time = lambda wildcards, attempt: 8640 + ((attempt - 1) * 1440),
partition = "milan"
shell:
"centrifuge "
"-x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB "
"--sample-sheet {input.sampleSheet} "
"-t "
"--threads {threads} "
"2>&1 | tee {log}"
This translates into,
centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log
My centrifugeSampleSheet.tsv file looks like this (I haven't had any issues with these in prior smaller batches):
$ head resources/centrifugeSampleSheet.tsv
1 results/02_kneaddata/FID274912.fastq NA results/03_centrifuge/FID274912.fastq.GTDB.centrifuge results/03_centrifuge/FID274912.fastq.GTDB.centrifuge.report
1 results/02_kneaddata/FID274915.fastq NA results/03_centrifuge/FID274915.fastq.GTDB.centrifuge results/03_centrifuge/FID274915.fastq.GTDB.centrifuge.report
1 results/02_kneaddata/FID274917.fastq NA results/03_centrifuge/FID274917.fastq.GTDB.centrifuge results/03_centrifuge/FID274917.fastq.GTDB.centrifuge.report
1 results/02_kneaddata/FID274918.fastq NA results/03_centrifuge/FID274918.fastq.GTDB.centrifuge results/03_centrifuge/FID274918.fastq.GTDB.centrifuge.report
1 results/02_kneaddata/FID274920.fastq NA results/03_centrifuge/FID274920.fastq.GTDB.centrifuge results/03_centrifuge/FID274920.fastq.GTDB.centrifuge.report
1 results/02_kneaddata/FID274921.fastq NA results/03_centrifuge/FID274921.fastq.GTDB.centrifuge results/03_centrifuge/FID274921.fastq.GTDB.centrifuge.report
1 results/02_kneaddata/FID274924.fastq NA results/03_centrifuge/FID274924.fastq.GTDB.centrifuge results/03_centrifuge/FID274924.fastq.GTDB.centrifuge.report
1 results/02_kneaddata/FID274925.fastq NA results/03_centrifuge/FID274925.fastq.GTDB.centrifuge results/03_centrifuge/FID274925.fastq.GTDB.centrifuge.report
1 results/02_kneaddata/FID274927.fastq NA results/03_centrifuge/FID274927.fastq.GTDB.centrifuge results/03_centrifuge/FID274927.fastq.GTDB.centrifuge.report
1 results/02_kneaddata/FID274929.fastq NA results/03_centrifuge/FID274929.fastq.GTDB.centrifuge results/03_centrifuge/FID274929.fastq.GTDB.centrifuge.report
In my workflow I have steps prior to this which aggregate a larger number of files into asingle command line parameter for input and did not have this error (seqkit stats).
Here is the module which loads snakemake on the HPC I am using,
$ module show snakemake
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/opt/nesi/CS400_centos7_bdw/modules/all/snakemake/7.26.0-gimkl-2022a-Python-3.11.3.lua:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
help([[
Description
===========
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses.
More information
================
- Homepage: https://snakemake.readthedocs.io
Included extensions
===================
appdirs-1.4.3, ConfigArgParse-0.13.0, datrie-0.8.2, docutils-0.20.1,
dpath-2.1.6, filelock-3.2.0, gitdb2-2.0.4, GitPython-2.1.11,
humanfriendly-10.0, jsonschema-2.6.0, plac-1.3.5, PuLP-2.5.0, reretry-0.11.8,
retry-0.9.2, smart_open-5.2.1, smmap2-2.0.4, snakemake-7.26.0,
throttler-1.2.2, throttler-1.2.2, toposort-1.6, wrapt-1.15.0, yte-1.5.1
]])
whatis("Description: The Snakemake workflow management system is a tool to create reproducible and scalable data analyses.")
whatis("Homepage: https://snakemake.readthedocs.io")
whatis("URL: https://snakemake.readthedocs.io")
whatis("Extensions: appdirs-1.4.3, ConfigArgParse-0.13.0, datrie-0.8.2, docutils-0.20.1, dpath-2.1.6, filelock-3.2.0, gitdb2-2.0.4, GitPython-2.1.11, humanfriendly-10.0, jsonschema-2.6.0, plac-1.3.5, PuLP-2.5.0, reretry-0.11.8, retry-0.9.2, smart_open-5.2.1, smmap2-2.0.4, snakemake-7.26.0, throttler-1.2.2, throttler-1.2.2, toposort-1.6, wrapt-1.15.0, yte-1.5.1")
conflict("snakemake")
prepend_path("CMAKE_PREFIX_PATH","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3")
prepend_path("LIBRARY_PATH","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3/lib")
prepend_path("PATH","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3/bin")
setenv("EBROOTSNAKEMAKE","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3")
setenv("EBVERSIONSNAKEMAKE","7.26.0")
setenv("EBDEVELSNAKEMAKE","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3/easybuild/snakemake-7.26.0-gimkl-2022a-Python-3.11.3-easybuild-devel")
prepend_path("PYTHONPATH","/opt/nesi/CS400_centos7_bdw/snakemake/7.26.0-gimkl-2022a-Python-3.11.3/lib/python3.11/site-packages")
setenv("EBEXTSLISTSNAKEMAKE","smart_open-5.2.1,filelock-3.2.0,PuLP-2.5.0,toposort-1.6,smmap2-2.0.4,gitdb2-2.0.4,GitPython-2.1.11,docutils-0.20.1,jsonschema-2.6.0,datrie-0.8.2,appdirs-1.4.3,ConfigArgParse-0.13.0,throttler-1.2.2,wrapt-1.15.0,retry-0.9.2,reretry-0.11.8,throttler-1.2.2,dpath-2.1.6,plac-1.3.5,yte-1.5.1,humanfriendly-10.0,snakemake-7.26.0")
Update.
I have been able to repeat the error. launching the workflow via a small sbatch script, as before.
#!/bin/bash
#SBATCH --job-name=LaunchSMK
#SBATCH --account=agresearch03843
#SBATCH --time=06-00:00:00
#SBATCH --mem=2G
#SBATCH --partition=milan
#SBATCH --output=%j_output.out
#SBATCH --mail-user=ben.perry@agresearch.co.nz
#SBATCH --mail-type=ALL
#SBATCH --cpus-per-task=2
module load snakemake
cd /nesi/nobackup/agresearch03843/methane/RE-RRS-GTDB
pwd
snakemake --profile config/slurm --snakefile workflow/profile.smk centrifugeGTDB
The error message appears,
Can't exec "/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class": Argument list too long at /home/perrybe/.conda/envs/centrifuge/bin/centrifuge line 695.
(ERR): Could not open Centrifuge pipe: '/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class --wrapper basic-0 -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB -t --threads 32 --separator --passthrough -U r
esults/02_kneaddata/FID274912.fastq,results/02_kneaddata/FID274915.fastq,results/02_kneaddata/FID274917.fastq,results/02_kneaddata/FID274918.fastq,results/02_kneaddata/FID274920.fastq,results/02_kneaddata/FID274921.fastq,re
sults/02_kneaddata/FID274924.fastq,results/02_kneaddata/FID274925.fastq,results/02_kneaddata/FID274927.fastq,results/02_kneaddata/FID274929.fastq,results/02_kneaddata/FID274932.fastq,results/02_kneaddata/FID274934.fastq,res
ults/02_kneaddata/FID274936.fastq,results/02_kneaddata/FID274938.fastq,results/02_kneaddata/FID274939.fastq,[...]
However, when I run the snakemake centrifugeGTDB rule interactively on the CLI (as below) I do not get the error message, and centrifuge appears to be running correctly.
snakemake --profile config/slurm --snakefile workflow/profile.smk centrifugeGTDB
It is unclear where this behavior is arising from, but my suspicion is that -- because centrifuge runs without error when launched interactively via snakemake -- it is not related to centrifuge.
The error seems to still be happening, even when launched interactively. It just takes longer to happen, could it be happening after the index loads?
$ cat -n 2023-05-26T175036.845717.snakemake.log | grep -v "GTDB.centrifuge"
1 Building DAG of jobs...
2 Using shell: /usr/bin/bash
3 Provided cluster nodes: 100
4 Job stats:
5 job count min threads max threads
6 -------------- ------- ------------- -------------
7 centrifugeGTDB 1 32 32
8 total 1 32 32
9
10 Select jobs to execute...
11
12 [Fri May 26 17:51:33 2023]
13 rule centrifugeGTDB:
14 input: resources/centrifugeSampleSheet.tsv
16 log: logs/centrifuge.GTDB.multi.log
17 jobid: 0
18 benchmark: benchmarks/centrifugeGTDB.txt
20 threads: 32
21 resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, account=agresearch03843, partition=milan, time=05-00:00:00, mem_gb=160
22
23 Submitted job 0 with external jobid 'Submitted batch job 36081670'.
24 [Fri May 26 18:02:02 2023]
25 Error in rule centrifugeGTDB:
26 jobid: 0
27 input: resources/centrifugeSampleSheet.tsv
29 log: logs/centrifuge.GTDB.multi.log (check log file(s) for error details)
30 conda-env: centrifuge
31 shell:
32 centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log
33 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
34 cluster_jobid: Submitted batch job 36081670
35
36 Error executing rule centrifugeGTDB on cluster (jobid: 0, external: Submitted batch job 36081670, jobscript: /scale_wlg_nobackup/filesets/nobackup/agresearch03843/methane/RE-RRS-GTDB/.snakemake/tmp.c4_kxd76/snakejob.centrifugeGTDB.0.sh). For error details see the cluster log and the log files of the involved rule(s).
37 Trying to restart job 0.
38 Select jobs to execute...
39
40 [Fri May 26 18:02:10 2023]
41 rule centrifugeGTDB:
42 input: resources/centrifugeSampleSheet.tsv
44 log: logs/centrifuge.GTDB.multi.log
45 jobid: 0
46 benchmark: benchmarks/centrifugeGTDB.txt
48 threads: 32
49 resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, account=agresearch03843, partition=milan, time=05-00:00:00, mem_gb=180
50
51 Submitted job 0 with external jobid 'Submitted batch job 36082140'.
52 [Sat May 27 08:09:25 2023]
53 Error in rule centrifugeGTDB:
54 jobid: 0
55 input: resources/centrifugeSampleSheet.tsv
57 log: logs/centrifuge.GTDB.multi.log (check log file(s) for error details)
58 conda-env: centrifuge
59 shell:
60 centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log
61 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
62 cluster_jobid: Submitted batch job 36082140
63
64 Error executing rule centrifugeGTDB on cluster (jobid: 0, external: Submitted batch job 36082140, jobscript: /scale_wlg_nobackup/filesets/nobackup/agresearch03843/methane/RE-RRS-GTDB/.snakemake/tmp.c4_kxd76/snakejob.centrifugeGTDB.0.sh). For error details see the cluster log and the log files of the involved rule(s).
From the logs, Job ID - 36081670
$ cat -n logs/centrifugeGTDB/centrifugeGTDB--36081670.out | grep -v ".fastq,results/02_kneaddata\|GTDB.centrifuge"
1 Building DAG of jobs...
2 Using shell: /usr/bin/bash
3 Provided cores: 32
4 Rules claiming more threads will be scaled down.
5 Provided resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, mem_gb=160
6 Select jobs to execute...
7
8 [Fri May 26 17:53:24 2023]
9 rule centrifugeGTDB:
10 input: resources/centrifugeSampleSheet.tsv
12 log: logs/centrifuge.GTDB.multi.log
13 jobid: 0
14 benchmark: benchmarks/centrifugeGTDB.txt
16 threads: 32
17 resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/dev/shm/jobs/36081670, account=agresearch03843, partition=milan, time=05-00:00:00, mem_gb=160
18
19 Activating conda environment: centrifuge
20 Can't exec "/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class": Argument list too long at /home/perrybe/.conda/envs/centrifuge/bin/centrifuge line 695.
22 Exiting now ...
23 [Fri May 26 18:01:53 2023]
24 Error in rule centrifugeGTDB:
25 jobid: 0
26 input: resources/centrifugeSampleSheet.tsv
28 log: logs/centrifuge.GTDB.multi.log (check log file(s) for error details)
29 conda-env: centrifuge
30 shell:
31 centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log
32 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
33
34 Shutting down, this might take some time.
35 Exiting because a job execution failed. Look above for error message
Job ID - 36082140
$ cat -n logs/centrifugeGTDB/centrifugeGTDB--36082140.out | grep -v ".fastq,results/02_kneaddata\|GTDB.centrifuge"
1 Building DAG of jobs...
2 Using shell: /usr/bin/bash
3 Provided cores: 32
4 Rules claiming more threads will be scaled down.
5 Provided resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, mem_gb=180
6 Select jobs to execute...
7
8 [Sat May 27 07:59:00 2023]
9 rule centrifugeGTDB:
10 input: resources/centrifugeSampleSheet.tsv
12 log: logs/centrifuge.GTDB.multi.log
13 jobid: 0
14 benchmark: benchmarks/centrifugeGTDB.txt
16 threads: 32
17 resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/dev/shm/jobs/36082140, account=agresearch03843, partition=milan, time=05-00:00:00, mem_gb=180
18
19 Activating conda environment: centrifuge
20 Can't exec "/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class": Argument list too long at /home/perrybe/.conda/envs/centrifuge/bin/centrifuge line 695.
22 Exiting now ...
23 [Sat May 27 08:07:04 2023]
24 Error in rule centrifugeGTDB:
25 jobid: 0
26 input: resources/centrifugeSampleSheet.tsv
28 log: logs/centrifuge.GTDB.multi.log (check log file(s) for error details)
29 conda-env: centrifuge
30 shell:
31 centrifuge -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB --sample-sheet resources/centrifugeSampleSheet.tsv -t --threads 32 2>&1 | tee logs/centrifuge.GTDB.multi.log
32 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
33
34 Shutting down, this might take some time.
35 Exiting because a job execution failed. Look above for error message
I will try to batch the job into smaller groups on Monday to see if the error persists.
Hi,
I am executing centrifuge in batch mode using snakemake on a slurm cluster, and am getting the following error message:
Can't exec "/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class": Argument list too long at /home/perrybe/.conda/envs/centrifuge/bin/centrifuge line 695. (ERR): Could not open Centrifuge pipe: '/home/perrybe/.conda/envs/centrifuge/bin/centrifuge-class --wrapper basic-0 -x /nesi/nobackup/agresearch03843/centrifuge/centrifuge/GTDB -t --threads 32 - -separator --passthrough -U results/02_kneaddata/FID274912.fastq, [...]
I have 9632 samples I am attempting to run in the batch, in the format
results/02_kneaddata/FID274912.fastq
.I am using a dedicated conda environment to execute in:
The unix system has the following configuration,
I've seen xargs mentioned in other issues with an
"Argument list too long"
error; however, to my reading none of them addressed this issue.Thank you for your time, Ben