Closed AGI-chandler closed 1 year ago
Hi @AGI-chandler , thanks for sharing your observations. Looks like Conda is messing up some of those packages. I did not change the environmental YAML file in the new version so I'm not sure what happened there. I typically very much prefer docker/singularity to avoid issues like this but glad to know that you got it working and thanks for sharing those steps.
For the denoise issue, can you check if there's anything inside ".command.err" and ".command.out"? The "/tmp/qiime2-q2cli-err-bp4mht2v.log" file might be on the compute node that runs the job if you're using the cluster mode, but otherwise I've seen in the past QIIME 2 has this weird issue of deleting the tmp file after it fails and asking users to check...
Can you also try running on the test samples just to rule out any weird issue?
Thanks.
The second issue is a bug I missed out asking the processes to use the correct environment. Will submit a pull request in a bit.
Hi @proteinosome thanks for your suggestions so quick tonight! That's good to know regarding second issue. I am used to the old way of installing apps system-wide for multi-user environments so unfortunately sometimes with these newer applications like nextflow, stuff is only accessible from the root account and not able to be run from regular user account. I also did find lots of duplicate data in /root/.nextflow
, /root/.conda
, /root/nf_conda
, ~/.nexflow
, ~/nf_conda
, and ~/.conda/envs
so I did uninstall and clear those out in case it caused any conflicts or confusion. Now the environment feels much cleaner to me and I configured pb-16S-nf to use ~/.conda/envs
.
We did try docker originally but that was not compatible in our cluster environment in rootless mode. I believe we need rootless mode so other users can use docker as well. I haven't used singularity before but if it is something that can run from user account with slurm then I can check into it for you.
Regarding the issue, there was nothing extra from .command.out
or .command.err
as those were already fully printed under Command output:
and Command error:
....
$ cat .command.out
${name}_working/work/d1/3ac7f378a25410b1e8b34a1bf06dab/dada2_custom_script/run_dada_ccs.R
$ cat .command.err
Plugin error from dada2:
An error was encountered while running DADA2 in R (return code -6), please inspect stdout and stderr to learn more.
Debug info has been saved to /tmp/qiime2-q2cli-err-bp4mht2v.log
and you were right the temp file was hiding on one of the nodes! but, the contents are not anything promising:
double free or corruption (out)
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
Command: run_dada_ccs.R /tmp/qiime2-archive-rujplarq/0ac7d821-4908-477f-9a64-f5b412d9cb07/data /tmp/tmprq5m3s7r/output.tsv.biom /tmp/tmprq5m3s7r/track.tsv /tmp/tmprq5m3s7r/nop /tmp/tmprq5m3s7r/filt none none 2 False 0 0 2 2 1000 1600 pseudo consensus 3.5 256 1000000
Traceback (most recent call last):
File "~/.conda/envs/qiime2-2022.2-py38-linux-conda-0f9d5d3ab8a678e45d02e511ce6c0160/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 361, in denoise_ccs
run_commands([cmd])
File "~/.conda/envs/qiime2-2022.2-py38-linux-conda-0f9d5d3ab8a678e45d02e511ce6c0160/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "~/.conda/envs/qiime2-2022.2-py38-linux-conda-0f9d5d3ab8a678e45d02e511ce6c0160/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada_ccs.R', '/tmp/qiime2-archive-rujplarq/0ac7d821-4908-477f-9a64-f5b412d9cb07/data', '/tmp/tmprq5m3s7r/output.tsv.biom', '/tmp/tmprq5m3s7r/track.tsv', '/tmp/tmprq5m3s7r/nop', '/tmp/tmprq5m3s7r/filt', 'none', 'none', '2', 'False', '0', '0', '2', '2', '1000', '1600', 'pseudo', 'consensus', '3.5', '256', '1000000']' died with <Signals.SIGABRT: 6>.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "~/.conda/envs/qiime2-2022.2-py38-linux-conda-0f9d5d3ab8a678e45d02e511ce6c0160/lib/python3.8/site-packages/q2cli/commands.py", line 339, in __call__
results = action(**arguments)
File "<decorator-gen-191>", line 2, in denoise_ccs
File "~/.conda/envs/qiime2-2022.2-py38-linux-conda-0f9d5d3ab8a678e45d02e511ce6c0160/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
outputs = self._callable_executor_(scope, callable_args,
File "~/.conda/envs/qiime2-2022.2-py38-linux-conda-0f9d5d3ab8a678e45d02e511ce6c0160/lib/python3.8/site-packages/qiime2/sdk/action.py", line 391, in _callable_executor_
output_views = self._callable(**view_args)
File "~/.conda/envs/qiime2-2022.2-py38-linux-conda-0f9d5d3ab8a678e45d02e511ce6c0160/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 370, in denoise_ccs
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code -6), please inspect stdout and stderr to learn more.
SIGABRT? That's not good. Yes in the system log it has:
Feb 13 18:54:58 n012 systemd[1]: Started Process Core Dump (PID 4138769/UID 0).
Feb 13 18:55:00 n012 systemd-coredump[4138770]: Core file was truncated to 2147483648 bytes.
Feb 13 18:55:04 n012 systemd-coredump[4138770]: Process 4121883 (R) of user 10063 dumped core.
Stack trace of thread 4121883:
#0 0x00007ffff756237f n/a (n/a)
Feb 13 18:55:04 n012 systemd[1]: systemd-coredump@0-4138769-0.service: Succeeded.
and I have this file but not sure what to do with it:
323M Feb 13 18:55 core.R.10063.cd89d332292b4d7796dc20df80e22b7a.4121883.1676339698000000.lz4
I'll be looking into that next...
I looked back at the log you showed above again and it seems like this is a run of one sample with 3 million reads? That might be way too many reads for one single sample and crashes it. Can you try downsampling the FASTQ and see if that gets it through?
I could, for testing purposes, but the 3 million reads came from your machine and all of the data needs to be processed, somehow, I'm sure. How should I downsample it and/or what extra options should I add to the nextflow command?
I figured out how to analyze the crash dump from previous iteration, but couldn't because it was truncated at 2 GiB: expected core file size >= 113433202688, found: 2147483648
. So I set the max size to 150 GiB and restarted the pipeline to see if the new dump can be analyzed at least.
and we're just in time for the coredump, my modifications helped... here is the stack trace if it helps:
#0 0x00007ffff756237f raise (libc.so.6)
#1 0x00007ffff754cdb5 abort (libc.so.6)
#2 0x00007ffff75a54e7 __libc_message (libc.so.6)
#3 0x00007ffff75ac5ec malloc_printerr (libc.so.6)
#4 0x00007ffff75ae390 _int_free (libc.so.6)
#5 0x00007fffe03d7335 _Z12dada_uniquesSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EES_IiSaIiEES_IbSaIbEEN4Rcpp6MatrixILi14ENSC_15PreserveStorageEEESF_iiibdidddbidiibbbibbibb (dada2.so)
#6 0x00007fffe03d237f _dada2_dada_uniques (dada2.so)
#7 0x00007ffff79ef1ea R_doDotCall (libR.so)
#8 0x00007ffff79efc36 do_dotcall (libR.so)
#9 0x00007ffff7a2dda1 bcEval (libR.so)
#10 0x00007ffff7a48560 Rf_eval (libR.so)
#11 0x00007ffff7a4a2bf R_execClosure (libR.so)
#12 0x00007ffff7a4b08e Rf_applyClosure (libR.so)
#13 0x00007ffff7a37d41 bcEval (libR.so)
#14 0x00007ffff7a48560 Rf_eval (libR.so)
#15 0x00007ffff7a4a2bf R_execClosure (libR.so)
#16 0x00007ffff7a4b08e Rf_applyClosure (libR.so)
#17 0x00007ffff7a37d41 bcEval (libR.so)
#18 0x00007ffff7a48560 Rf_eval (libR.so)
#19 0x00007ffff7a4a2bf R_execClosure (libR.so)
#20 0x00007ffff7a4b08e Rf_applyClosure (libR.so)
#21 0x00007ffff7a48791 Rf_eval (libR.so)
#22 0x00007ffff7a49087 forcePromise (libR.so)
#23 0x00007ffff7a49368 getvar (libR.so)
#24 0x00007ffff7a31a02 bcEval (libR.so)
#25 0x00007ffff7a48560 Rf_eval (libR.so)
#26 0x00007ffff7a49087 forcePromise (libR.so)
#27 0x00007ffff7a49368 getvar (libR.so)
#28 0x00007ffff7a31a02 bcEval (libR.so)
#29 0x00007ffff7a48560 Rf_eval (libR.so)
#30 0x00007ffff7a4a2bf R_execClosure (libR.so)
#31 0x00007ffff7a4b08e Rf_applyClosure (libR.so)
#32 0x00007ffff7a37d41 bcEval (libR.so)
#33 0x00007ffff7a48560 Rf_eval (libR.so)
#34 0x00007ffff7a4a2bf R_execClosure (libR.so)
#35 0x00007ffff7a4b08e Rf_applyClosure (libR.so)
#36 0x00007ffff7a48791 Rf_eval (libR.so)
#37 0x00007ffff7a4cd94 do_set (libR.so)
#38 0x00007ffff7a48a99 Rf_eval (libR.so)
#39 0x00007ffff7a7d336 Rf_ReplIteration (libR.so)
#40 0x00007ffff7a7d6f9 R_ReplConsole (libR.so)
#41 0x00007ffff7a7d7a9 run_Rmainloop (libR.so)
#42 0x000055555555505d main (R)
#43 0x00007ffff754e493 __libc_start_main (libc.so.6)
#44 0x000055555555508d _start (R)
It says I need to install debuginfos if that isn't enough info...
HI @AGI-chandler I hear you. Unfortunately it's not usual for 16S sample to have that high of a depth. Even for a species with 0.01%, sequencing 100k depth would already give you on average 10 reads. 3 millions are overkill in most scenario unless there's a very very specific need. I think the softwares are just not designed to handle that amount of reads per sample.
I would first maybe downsample to 100k just to see if we can get the sample to analyze finish. The other approach is to split the 3 million reads into "chunks", so maybe 100 sets of 30k reads each, and run them through the pipeline (set "condition" to be identical for all 100 samples). The pipeline will be able to make use of information from these "chunks" for discovery purpose so you won't miss much by doing that. The researcher will just have to merge the results in the end. Does that make sense?
Just let me know how to do it and I'll pass the info along to the team
You can use seqkit split2 -s 50000 -O OUTPUTDIR SAMPLE.fastq.gz
which will split SAMPLE.fastq.gz
into multiple parts containing 50,000 reads each into the folder OUTPUTDIR
. Then, create one sample manifest with each sample pointing to a different parts output by seqkit
, e.g.:
sample-id absolute-filepath
sample1 /path/to/part1.fastq.gz
sample2 /path/to/part2.fastq.gz
And your metadata file:
sample_name condition
sample1 replicate
sample2 replicate
And run the pipeline on the new sample manifest and metadata file.
Thanks.
Hi @proteinosome, than you for helping us troubleshoot the issues. I work with @AGI-chandler
The issue with the number of reads per sample may be easy to solve, once we figure out how to set up the metadata. We are analyzing data from a single Sequel II SMRT cell, with about 4.6 million HiFi reads (input, from the sequencer). There are 96 samples, all different from each other, and we don't know anything about them. This is why we treated them as a single "condition" per your documentation.
Besides downsampling (which is not the real answer in our case I think), how else can we parse data from a cell that has 96 different samples, all related to the same condition/experiment, without splitting the input? Soon we will sequence another set of 192 samples, obtained with PacBio's 24 x 8 primer set. Looking forward to your opinion, Dario
Hi @dcopetti did you barcode the samples? You need to demultiplex them first so you have 96 FASTQs.
Hi Chua,
Yes, the amplicons were generated with asymmetric barcodes as documented here: https://www.pacb.com/wp-content/uploads/Procedure-checklist-Amplification-of-bacterial-full-length-16S-rRNA-gene-with-barcoded-primers.pdf We used the set 13-24 as reverse primer.
No, we did not demultiplex the samples before running the pipeline. The test samples we run (16 samples, rev primers of columns 1 and 2) did the demultiplexing within it, we used the whole HiFi_reads as the input file. are we missing something?
Hi @dcopetti , thank you for the clarification (Sorry for my weird response timing, I'm based in Singapore). The pipeline expects demultiplexed input. The sample sheet should contain one line per sample, each pointing to the sample's FASTQ. You will likely need to rerun your test samples, too, if you did not demultiplex. The "demultiplexing" step in this pipeline is used to trim the forward and reverse primers, but it does not look for samples barcodes. Do you know how to demultiplex PacBio's HiFi reads?
Hi Chua, We found the issue, it was between the chair and the keyboard :-) We will run the pipeline soon Thanks for your support!
Glad to hear that! I will close this issue for now, then. Feel free to open another issue if you run into one (hopefully you don't!).
Hi there, have had a few new issues with latest version and now am stuck on the final issue.
nextflow.config:
``` process { executor = 'slurm' queue = 'defq' } params.dada2_cpu="256" params.vsearch_cpu="256" params.cutadapt_cpu="256" // CPU limit if using local executor process { name = "Local" cpus = 32 } // Set cpus and memory. Process in "main.nf" will // contain label corresponding to one of these unless // the CPUs can be specified via specific parameters // (e.g. --vsearch_cpu) process { withLabel: cpu_def { cpus = 32 memory = 128.GB } withLabel: cpu8 { cpus = 64 memory = 256.GB } withLabel: cpu32 { cpus = 256 memory = 1024.GB } } profiles { standard { params.enable_conda = false params.enable_container=false singularity.enabled = false singularity.automounts = false docker.enabled = false podman.enabled = false shifter.enabled = false charliecloud.enabled = false } conda { conda { useMamba = false conda.enabled = true // Allow longer conda creation timeout createTimeout = '24 h' cacheDir = "$HOME/.conda/envs" } params.enable_conda = true params.enable_container=false singularity.enabled = false singularity.automounts = false docker.enabled = false podman.enabled = false shifter.enabled = false charliecloud.enabled = false } singularity { singularity.enabled = true singularity.autoMounts = true singularity.cacheDir = "$HOME/.conda/envs/singularity" params.enable_container=true docker.enabled = false podman.enabled = false shifter.enabled = false charliecloud.enabled = false } docker { singularity.enabled = false singularity.autoMounts = false docker.enabled = true params.enable_container=true podman.enabled = false shifter.enabled = false charliecloud.enabled = false } } // Generate report report { enabled = true file = "$params.outdir/report.html" overwrite = true } // Timeline timeline { enabled = true file = "$params.outdir/timeline.html" overwrite = true } // DAG dag { enabled = true file = "$params.outdir/dag.html" overwrite = true } ```First Issue: Error executing process > 'pb16S:cutadapt (1)'
Error output:
``` Caused by: Failed to create Conda environment command: conda env create --prefix ~/.conda/envs/qiime2-2022.2-py38-linux-conda-0f9d5d3ab8a678e45d02e511ce6c0160 --file ${pb_16S_nf_base}/env/qiime2-2022.2-py38-linux-conda.yml status : 120 message: ```Second Issue: Error executing process > 'pb16S:merge_sample_manifest' [csvtk: command not found]
Error output:
``` Caused by: Process `pb16S:merge_sample_manifest` terminated with an error exit status (127) Command executed: csvtk concat -t samplefile.txt > merged_sample_file.txt Command exit status: 127 Command output: (empty) Command error: .command.sh: line 2: csvtk: command not found Work dir: ${name}_working/work/73/981c2869a4629afcd01ea635a6ff2b Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh` ```Third Issue: Error executing process > 'pb16S:dada2_denoise (1)'
Full command and parameters:
``` $ alias pb_16S_nf="nextflow run ${pb_16S_nf_base}/main.nf -profile conda" $ mkdir ${name}_working && cd ${name}_working $ pb_16S_nf --input ../${name}_inputs/${name}_samples.tsv --metadata ../${name}_inputs/${name}_metadata.tsv --outdir ../${name}_outputs N E X T F L O W ~ version 23.01.0-edge Launching `${pb_16S_nf_base}/main.nf` [berserk_linnaeus] DSL2 - revision: 4047538644 Only 1 sample. min_asv_sample and min_asv_totalfreq set to 0. Parameters set for pb-16S-nf pipeline for PacBio HiFi 16S ========================================================= Number of samples in samples TSV: 1 Filter input reads above Q: 20 Trim primers with cutadapt: Yes Forward primer: AGRGTTYGATYMTGGCTCAG Reverse primer: AAGTCGTAACAAGGTARCY Minimum amplicon length filtered in DADA2: 1000 Maximum amplicon length filtered in DADA2: 1600 maxEE parameter for DADA2 filterAndTrim: 2 minQ parameter for DADA2 filterAndTrim: 0 Pooling method for DADA2 denoise process: pseudo Minimum number of samples required to keep any ASV: 0 Minimum number of reads required to keep any ASV: 0 Taxonomy sequence database for VSEARCH: ${pb_16S_nf_base}/databases/GTDB_ssu_all_r207.qza Taxonomy annotation database for VSEARCH: ${pb_16S_nf_base}/databases/GTDB_ssu_all_r207.taxonomy.qza Skip Naive Bayes classification: false SILVA database for Naive Bayes classifier: ${pb_16S_nf_base}/databases/silva_nr99_v138.1_wSpecies_train_set.fa.gz GTDB database for Naive Bayes classifier: ${pb_16S_nf_base}/databases/GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz RefSeq + RDP database for Naive Bayes classifier: ${pb_16S_nf_base}/databases/RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz VSEARCH maxreject: 100 VSEARCH maxaccept: 100 VSEARCH perc-identity: 0.97 QIIME 2 rarefaction curve sampling depth: null Number of threads specified for cutadapt: 256 Number of threads specified for DADA2: 256 Number of threads specified for VSEARCH: 256 Script location for HTML report generation: ${pb_16S_nf_base}/scripts/visualize_biom.Rmd Container enabled via docker/singularity: false Version of Nextflow pipeline: 0.5 [ab/b27986] process > pb16S:write_log [100%] 1 of 1 ✔ [90/07120c] process > pb16S:QC_fastq (1) [100%] 1 of 1 ✔ [e9/f2bd61] process > pb16S:cutadapt (1) [100%] 1 of 1 ✔ [20/cd7cfa] process > pb16S:QC_fastq_post_trim (1) [100%] 1 of 1 ✔ [0b/6dd519] process > pb16S:collect_QC [100%] 1 of 1 ✔ [ee/440c69] process > pb16S:prepare_qiime2_manifest (1) [100%] 1 of 1 ✔ [0d/e4a74a] process > pb16S:merge_sample_manifest [100%] 1 of 1 ✔ [f7/776596] process > pb16S:import_qiime2 (1) [100%] 1 of 1 ✔ [d9/c83ad6] process > pb16S:demux_summarize (1) [100%] 1 of 1 ✔ [d1/3ac7f3] process > pb16S:dada2_denoise (1) [100%] 1 of 1, failed: 1 ✘ ```Error output:
``` Error executing process > 'pb16S:dada2_denoise (1)' Caused by: Process `pb16S:dada2_denoise (1)` terminated with an error exit status (1) Command executed: # Use custom script that can skip primer trimming mkdir -p dada2_custom_script cp run_dada_ccs.R dada2_custom_script/run_dada_ccs_original.R sed 's/minQ\ =\ 0/minQ=0/g' dada2_custom_script/run_dada_ccs_original.R > dada2_custom_script/run_dada_ccs.R chmod +x dada2_custom_script/run_dada_ccs.R export PATH="./dada2_custom_script:$PATH" which run_dada_ccs.R qiime dada2 denoise-ccs --i-demultiplexed-seqs samples.qza --o-table dada2-ccs_table.qza --o-representative-sequences dada2-ccs_rep.qza --o-denoising-stats dada2-ccs_stats.qza --p-min-len 1000 --p-max-len 1600 --p-max-ee 2 --p-front 'none' --p-adapter 'none' --p-n-threads 256 --p-pooling-method 'pseudo' Command exit status: 1 Command output: dada2_custom_script/run_dada_ccs.R Command error: Plugin error from dada2: An error was encountered while running DADA2 in R (return code -6), please inspect stdout and stderr to learn more. Debug info has been saved to /tmp/qiime2-q2cli-err-bp4mht2v.log Work dir: ${name}_working/work/d1/3ac7f378a25410b1e8b34a1bf06dab Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line ```