Closed ewissel closed 6 months ago
Hi @ewissel , did you mean that you ran the same exact command, but with different samples and it's causing the error?
Can you go into the directory /ceph/.../all_raw_fqs/work/85/3a3d48e4f5f3e45d5fe4deb0ae1aee
and check if the databases are linked correctly? Nextflow soft-links the input files in the work directory, and if you check those soft-links you should be able to see if they exist. Since you're using conda
profile you should also be able to activate the conda environment (in $HOME/nf_conda
, this should be using the QIIME2 environment) and try running the command manually by typing bash .command.run
.
Hey @proteinosome, yes I ran the same command but with different samples and additional flags (database paths, cpu usage, etc). I just reran the test data included in the tutorial with the same commands / flags (command below) and now got the same error that I got with real data.
Test data command:
nextflow run main.nf --input test_data/test_sample.tsv --metadata test_data/test_metadata.tsv -profile conda --outdir results --resume --dada2_cpu 12 --vsearch_cpu 12 --cutadapt_cpu 12 --vsearch_db /ceph/.../emily/databases/silva-138-99-seqs.qza --vsearch_tax /ceph/.../emily/databases/silva-138-99-tax.qza --gtdb_db /ceph/.../emily/databases/GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz --refseq_db /ceph/.../emily/databases/RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz --silva_db /ceph/.../emily/databases/silva_nr99_v138.1_wSpecies_train_set.fa.gz
Error message (same as the above):
Error executing process > 'pb16S:dada2_assignTax' Caused by: Process
pb16S:dada2_assignTax
terminated with an error exit status (1)Command executed:
Rscript --vanilla dada2_assign_tax.R dada2_ASV.fasta 12 silva_nr99_v138.1_wSpecies_train_set.fa.gz GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz 80
qiime feature-table transpose --i-table dada2-ccs_table_filtered.qza --o-transposed-feature-table transposed-asv.qza
qiime tools import --type "FeatureData[Taxonomy]" --input-format "TSVTaxonomyFormat" --input-path best_taxonomy.tsv --output-path best_tax.qza
qiime metadata tabulate --m-input-file dada2-ccs_rep_filtered.qza --m-input-file best_tax.qza --m-input-file transposed-asv.qza --o-visualization merged_freq_tax.qzv
qiime tools export --input-path merged_freq_tax.qzv --output-path merged_freq_tax_tsv
mv merged_freq_tax_tsv/metadata.tsv best_tax_merged_freq_tax.tsv
Command exit status: 1
Command output: (empty)
Command error: Loading required package: Rcpp Warning messages: 1: package ‘dada2’ was built under R version 4.2.3 2: package ‘Rcpp’ was built under R version 4.2.3 Error: Input/Output no input files found dirPath: silva_nr99_v138.1_wSpecies_train_set.fa.gz pattern: character(0) Execution halted
Work dir: /ceph/.../emily/raw_dat/all_raw_fqs/work/36/9b6651f230ff5cdc99c06a3cb8f04e
So it seems the issue is with the database flag when I am pointing to a database folder in a different directory that was downloaded with a previous run of this nextflow pipeline. The databases as downloaded with a new nextflow run are there properly (as shown below), but I don't want to download the databases each time I run the pipeline since it is the same database used between runs.
ls /ceph/.../all_raw_fqs/work/85/3a3d48e4f5f3e45d5fe4deb0ae1aee
dada2_assign_tax.R
dada2-ccs_table_filtered.qza
silva_nr99_v138.1_wSpecies_train_set.fa.gz dada2_ASV.fasta
GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz dada2-ccs_rep_filtered.qza
RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz
Forgot the last part of your response - when I go to $HOME/nf_conda
and run bash .command.run
, I get the error message:
bash: .command.run: No such file or directory
so I think I'm missing something there
@ewissel You run the bash .command.run
command in the work folder, not the nf_conda
folder.
cd /ceph/.../all_raw_fqs/work/85/3a3d48e4f5f3e45d5fe4deb0ae1aee
conda activate $HOME/nf_conda/*qiime*
bash .command.run
To verify if the soft-links exist, ls -l
is not sufficient. You can try this command:
for f in $(readlink -f /ceph/.../all_raw_fqs/work/85/3a3d48e4f5f3e45d5fe4deb0ae1aee/*); do ls -l ${f}; done > /dev/null
And see if there's any error output.
Ah thank you for the correction.
cd /ceph/.../all_raw_fqs/work/85/3a3d48e4f5f3e45d5fe4deb0ae1aee
conda activate $HOME/nf_conda/*qiime*
bash .command.run
outputs the following:
Loading required package: Rcpp Warning messages: 1: package ‘dada2’ was built under R version 4.2.3 2: package ‘Rcpp’ was built under R version 4.2.3 Error: Input/Output no input files found dirPath: silva_nr99_v138.1_wSpecies_train_set.fa.gz pattern: character(0) Execution halted
while
for f in $(readlink -f /ceph/.../all_raw_fqs/work/85/3a3d48e4f5f3e45d5fe4deb0ae1aee/*); do ls -l ${f}; done > /dev/null
outputs nothing.
ls: cannot access /ceph/.../emily/databases/GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz: No such file or directory ls: cannot access /ceph/.../emily/databases/RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz: No such file or directory ls: cannot access /ceph/.../emily/databases/silva_nr99_v138.1_wSpecies_train_set.fa.gz: No such file or directory
so the soft link from nextflow does point to the right directory (/ceph/...emily/databases/
)
edit:
also I'm not posting my full paths but the databases are in /ceph/.../emily/databases/
which I use as a generic folder for databases, while the working directory is /ceph/.../emily/project-name/raw_dat/all_raw_fqs/
@ewissel , I am confused. You said the command output nothing, but the following messages:
ls: cannot access /ceph/.../emily/databases/GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz: No such file or directory
ls: cannot access /ceph/.../emily/databases/RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz: No such file or directory
ls: cannot access /ceph/.../emily/databases/silva_nr99_v138.1_wSpecies_train_set.fa.gz: No such file or directory
Says that those databases paths are wrong.
sorry about that - I caught an issue in the path name that led to the ls cannot access
error, but then fixed that, reran the test and all of the above commands, and got all the same error messages from the test run with no output from for f in $(readlink -f /ceph/.../all_raw_fqs/work/85/3a3d48e4f5f3e45d5fe4deb0ae1aee/*); do ls -l ${f}; done > /dev/null
.
this is why I shouldn't try to respond to github messages between meetings
so for f in $(readlink -f /ceph/.../all_raw_fqs/work/85/3a3d48e4f5f3e45d5fe4deb0ae1aee/*); do ls -l ${f}; done
(removed >dev/null/) outputs
-rw-rw-r-- 1 ewissel IBMS-PHLab 5693 Apr 18 00:58 /ceph/../all_raw_fqs/scripts/dada2_assign_tax.R -rw-r--r-- 1 ewissel IBMS-PHLab 91573 Apr 24 07:50 /ceph/.../all_raw_fqs/work/14/9eac0a850ed0ce8872e91184c260c6/dada2_ASV.fasta -rw-rw-r-- 1 ewissel IBMS-PHLab 49814 Apr 24 07:50 /ceph/.../all_raw_fqs/work/14/9eac0a850ed0ce8872e91184c260c6/dada2-ccs_rep_filtered.qza -rw-rw-r-- 1 ewissel IBMS-PHLab 27231 Apr 24 07:50 /ceph/.../all_raw_fqs/work/14/9eac0a850ed0ce8872e91184c260c6/dada2-ccs_table_filtered.qza -rw-rw-r-- 1 ewissel IBMS-PHLab 12722100 Apr 25 03:04 /ceph/.../emily/databases/GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz -rw-rw-r-- 1 ewissel IBMS-PHLab 6456123 Apr 25 03:04 /ceph/.../emily/databases/RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz -rw-rw-r-- 1 ewissel IBMS-PHLab 138314202 Apr 25 03:04 /ceph/sharedfs/.../emily/databases/silva_nr99_v138.1_wSpecies_train_set.fa.gz
Did you modify the path names manually? In the last comment the Silva database has a path of /ceph/sharedfs/.../emily/databases/silva_nr99_v138.1_wSpecies_train_set.fa.gz
whereas all the other databases files are in /ceph/.../emily/databases/
(No sharedfs
).
I suspect what's happening is some sort of shared filesystems getting messy here. The error messages is indicative of missing path and as you've shared it's also not working now with the test dataset.
I would suggest maybe creating a fresh directory and put all your FASTQs as well as the databases files in that directory, then try rerunning.
OK i will re-try with fresh directory/install and let you know how that goes
OK I went ahead and started with a fresh install on /ceph/ with the following
conda activate nf ## contains the version of nextflow used in the development of this module
git clone https://github.com/PacificBiosciences/pb-16S-nf.git
cd pb-16S-nf
nextflow run main.nf --download_db
nextflow run main.nf --help ## runs no prob
# Create sample TSV for testing
echo -e "sample-id\tabsolute-filepath\ntest_data\t$(readlink -f test_data/test_1000_reads.fastq.gz)" > test_data/test_sample.tsv
srun --partition=amd_short nextflow run main.nf --input test_data/testing.tsv \ ## note this is outdated in the tutorial, expecting test_sample.tsv
--metadata test_data/test_metadata.tsv -profile conda \
--outdir results
tutorial test produces this output, now it fails:
(nf) [ewissel@slurm-ui03 pb-16S-nf]$ srun --partition=amd_short nextflow run main.nf --input test_data/testing.tsv --metadata test_data/test_metadata.tsv -profile conda --outdir results --resume N E X T F L O W ~ version 22.10.6 Launching
main.nf
[dreamy_kilby] DSL2 - revision: 81aec31ba0 Only 1 sample. min_asv_sample and min_asv_totalfreq set to 0.Parameters set for pb-16S-nf pipeline for PacBio HiFi 16S
Number of samples in samples TSV: 1 Filter input reads above Q: 20 Trim primers with cutadapt: Yes Limit to N reads if exceeding N reads (0 = disabled): 0 Forward primer: AGRGTTYGATYMTGGCTCAG Reverse primer: AAGTCGTAACAAGGTARCY Minimum amplicon length filtered in DADA2: 1000 Maximum amplicon length filtered in DADA2: 1600 maxEE parameter for DADA2 filterAndTrim: 2 minQ parameter for DADA2 filterAndTrim: 0 Pooling method for DADA2 denoise process: pseudo Minimum number of samples required to keep any ASV: 0 Minimum number of reads required to keep any ASV: 0 Taxonomy sequence database for VSEARCH: /ceph/sharedfs/work/IBMS-PHLab/emily/pb-16S-nf/databases/GTDB_ssu_all_r207.qza Taxonomy annotation database for VSEARCH: /ceph/sharedfs/work/IBMS-PHLab/emily/pb-16S-nf/databases/GTDB_ssu_all_r207.taxonomy.qza Skip Naive Bayes classification: false SILVA database for Naive Bayes classifier: /ceph/sharedfs/work/IBMS-PHLab/emily/pb-16S-nf/databases/silva_nr99_v138.1_wSpecies_train_set.fa.gz GTDB database for Naive Bayes classifier: /ceph/sharedfs/work/IBMS-PHLab/emily/pb-16S-nf/databases/GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz RefSeq + RDP database for Naive Bayes classifier: /ceph/sharedfs/work/IBMS-PHLab/emily/pb-16S-nf/databases/RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz VSEARCH maxreject: 100 VSEARCH maxaccept: 100 VSEARCH perc-identity: 0.97 QIIME 2 rarefaction curve sampling depth: null Number of threads specified for cutadapt: 16 Number of threads specified for DADA2: 8 Number of threads specified for VSEARCH: 8 Script location for HTML report generation: /ceph/sharedfs/work/IBMS-PHLab/emily/pb-16S-nf/scripts/visualize_biom.Rmd Container enabled via docker/singularity: false Version of Nextflow pipeline: 0.7
[- ] process > pb16S:write_log - [- ] process > pb16S:QC_fastq - [- ] process > pb16S:cutadapt - [- ] process > pb16S:QC_fastq_post_trim - [- ] process > pb16S:collect_QC -
executor > Local (2) [11/143b62] process > pb16S:write_log [ 0%] 0 of 1 [00/36994e] process > pb16S:QC_fastq (1) [ 0%] 0 of 1 [- ] process > pb16S:cutadapt - [- ] process > pb16S:QC_fastq_post_trim - [- ] process > pb16S:collect_QC - [- ] process > pb16S:prepare_qiime2_manifest - [- ] process > pb16S:merge_sample_manifest - [- ] process > pb16S:import_qiime2 - [- ] process > pb16S:demux_summarize - [- ] process > pb16S:dada2_denoise - [- ] process > pb16S:mergeASV - [- ] process > pb16S:filter_dada2 - [- ] process > pb16S:dada2_qc - [- ] process > pb16S:qiime2_phylogeny_dive... - [- ] process > pb16S:dada2_rarefaction - [- ] process > pb16S:class_tax - [- ] process > pb16S:dada2_assignTax - [- ] process > pb16S:export_biom - [- ] process > pb16S:barplot_nb - [- ] process > pb16S:barplot - [- ] process > pb16S:html_rep - [- ] process > pb16S:krona_plot -
executor > Local (2) [11/143b62] process > pb16S:write_log [ 0%] 0 of 1 [00/36994e] process > pb16S:QC_fastq (1) [ 0%] 0 of 1 [- ] process > pb16S:cutadapt - [- ] process > pb16S:QC_fastq_post_trim - [- ] process > pb16S:collect_QC - [- ] process > pb16S:prepare_qiime2_manifest - [- ] process > pb16S:merge_sample_manifest - [- ] process > pb16S:import_qiime2 - [- ] process > pb16S:demux_summarize - [- ] process > pb16S:dada2_denoise - [- ] process > pb16S:mergeASV - [- ] process > pb16S:filter_dada2 - [- ] process > pb16S:dada2_qc - [- ] process > pb16S:qiime2_phylogeny_dive... - [- ] process > pb16S:dada2_rarefaction - [- ] process > pb16S:class_tax - [- ] process > pb16S:dada2_assignTax - [- ] process > pb16S:export_biom - [- ] process > pb16S:barplot_nb - [- ] process > pb16S:barplot - [- ] process > pb16S:html_rep - [- ] process > pb16S:krona_plot -
executor > Local (2) [11/143b62] process > pb16S:write_log [ 0%] 0 of 1 [00/36994e] process > pb16S:QC_fastq (1) [ 0%] 0 of 1 [- ] process > pb16S:cutadapt - [- ] process > pb16S:QC_fastq_post_trim - [- ] process > pb16S:collect_QC - [- ] process > pb16S:prepare_qiime2_manifest - [- ] process > pb16S:merge_sample_manifest - [- ] process > pb16S:import_qiime2 - [- ] process > pb16S:demux_summarize - [- ] process > pb16S:dada2_denoise - [- ] process > pb16S:mergeASV - [- ] process > pb16S:filter_dada2 - [- ] process > pb16S:dada2_qc - [- ] process > pb16S:qiime2_phylogeny_dive... - [- ] process > pb16S:dada2_rarefaction - [- ] process > pb16S:class_tax - [- ] process > pb16S:dada2_assignTax - [- ] process > pb16S:export_biom - [- ] process > pb16S:barplot_nb - [- ] process > pb16S:barplot - [- ] process > pb16S:html_rep - [- ] process > pb16S:krona_plot - Error executing process > 'pb16S:QC_fastq (1)'
Caused by: Process
pb16S:QC_fastq (1)
terminated with an error exit status (255)Command executed:
seqkit fx2tab -j 8 -q --gc -l -H -n -i test_1000_reads.fastq.gz | csvtk mutate2 -C '%' -t -n sample -e '"test_data"' > test_data.seqkit.readstats.tsv seqkit stats -T -j 8 -a test_1000_reads.fastq.gz | csvtk mutate2 -C '%' -t -n sample -e '"test_data"' > test_data.seqkit.summarystats.tsv seqkit seq -j 8 --min-qual 20 test_1000_reads.fastq.gz --out-file test_data.filterQ20.fastq.gz
Command exit status: 255
Command output: (empty)
Command error: [ERRO] stat test_1000_reads.fastq.gz: no such file or directory [ERRO] xopen: no content
Work dir: /ceph/sharedfs/work/IBMS-PHLab/emily/pb-16S-nf/work/00/36994e1494b06ce886ae9ad5907959
Tip: when you have fixed the problem you can continue the execution adding the option
-resume
to the run command lineWARN: Killing running tasks (1)
executor > Local (2) [- ] process > pb16S:write_log - [00/36994e] process > pb16S:QC_fastq (1) [100%] 1 of 1, failed: 1 ✘ [- ] process > pb16S:cutadapt - [- ] process > pb16S:QC_fastq_post_trim - [- ] process > pb16S:collect_QC - [- ] process > pb16S:prepare_qiime2_manifest - [- ] process > pb16S:merge_sample_manifest - [- ] process > pb16S:import_qiime2 - [- ] process > pb16S:demux_summarize - [- ] process > pb16S:dada2_denoise - [- ] process > pb16S:mergeASV - [- ] process > pb16S:filter_dada2 - [- ] process > pb16S:dada2_qc - [- ] process > pb16S:qiime2_phylogeny_dive... - [- ] process > pb16S:dada2_rarefaction - [- ] process > pb16S:class_tax - [- ] process > pb16S:dada2_assignTax - [- ] process > pb16S:export_biom - [- ] process > pb16S:barplot_nb - [- ] process > pb16S:barplot - [- ] process > pb16S:html_rep - [- ] process > pb16S:krona_plot - Error executing process > 'pb16S:QC_fastq (1)'
Caused by: Process
pb16S:QC_fastq (1)
terminated with an error exit status (255)Command executed:
seqkit fx2tab -j 8 -q --gc -l -H -n -i test_1000_reads.fastq.gz | csvtk mutate2 -C '%' -t -n sample -e '"test_data"' > test_data.seqkit.readstats.tsv seqkit stats -T -j 8 -a test_1000_reads.fastq.gz | csvtk mutate2 -C '%' -t -n sample -e '"test_data"' > test_data.seqkit.summarystats.tsv seqkit seq -j 8 --min-qual 20 test_1000_reads.fastq.gz --out-file test_data.filterQ20.fastq.gz
Command exit status: 255
Command output: (empty)
Command error: [ERRO] stat test_1000_reads.fastq.gz: no such file or directory [ERRO] xopen: no content
Work dir: /ceph/sharedfs/work/IBMS-PHLab/emily/pb-16S-nf/work/00/36994e1494b06ce886ae9ad5907959
Tip: when you have fixed the problem you can continue the execution adding the option
-resume
to the run command lineWARN: Failed to render execution report -- see the log file for details WARN: Failed to render execution timeline -- see the log file for details
srun: error: hpa-wn03: task 0: Exited with exit code 1 (nf) [ewissel@slurm-ui03 pb-16S-nf]$ ls test_data/ test_1000_reads.fastq.gz testing.tsv
Previously I git clone
ed to my user home directory, which is on a different node than /ceph/
(but still ran/submitted jobs from /ceph/
). Maybe I need to talk to server admins and see if this is actually a server permission issue? I'm not sure why else the new test would fail with the same type of issue.
I'm getting a similar issue with an nf-core pipeline, so I suspect it may be a server issue on my end. I'll contact our server admin team and update later.
So this was a weird issue with file permissions across nodes on my HPC.
The solution was to re-install nextflow and this repo on a different node (/ceph/
) than /home/$USER/
and using the -profile conda
flag worked. Thanks for the help!
Hey HiFi team,
I am trying to run this nextflow pipeline on real data on HPC/slurm. I am able to run the test data successfully and run into the following error when I try to execute the pipeline with a few real samples.
Here is my nextflow command for tutorial data:
nextflow run main.nf --input test_data/test_sample.tsv --metadata test_data/test_metadata.tsv -profile conda --outdir results
Here is my nextflow command for real data:
I get the following error:
I interpret this error to be in regards to
Rscript --vanilla dada2_assign_tax.R dada2_ASV.fasta 12 silva_nr99_v138.1_wSpecies_train_set.fa.gz GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz 80
as there is no path provided to the database files like I provided with the relevant--db
flags. I confirmed that the test run was executed on the same hpc node so it shouldn't be an issue there.The tutorial database is located in the working directory, while the full database I want to use for real data is located in a different directory under a parent directory with the same user permissions. Thanks in advance for the help troubleshooting!
version info: