AssignTaxonomy error exit status 137

tjlim10 commented 6 months ago

Hello,

Happy new year and I hope you are well!

I was trying to run MetONTiime on my university's HPC with Singularity, on 44 samples (using SILVA 138 database for 16S rRNA). Everything ran well until it reached AssignTaxonomy step, where I encountered error exit status 137 (assuming that to be related with memory requirement). I used Vsearch classifier. Please see the following error:

Error executing process > 'assignTaxonomy (1)'

Caused by: Process assignTaxonomy (1) terminated with an error exit status (137)

Command executed:

mkdir -p /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy

-perc-classifier_uc=$(awk '{print toupper($0)'} <<< Vsearch)

-searcif [ "$classifier_uc" == "BLAST" ]; then qiime feature-classifier makeblastdb --i-sequences /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/importDb/db_sequences.qza --o-database /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/importDb/blastIndexedDb.qza
    fi      qiime feature-classifier classify-consensus-blast                       --i-query /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/derepSeq/rep-seqs.qza                      --i-blastdb /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/importDb/blastIndexedDb.qza                   --i-reference-taxonomy /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/importDb/db_taxonomy.qza                      --p-num-threads 36 ta tabulate       --p-perc-identity 0.9  /fs03/hj18/Tim_du--p-query-cov 0.8 ONTIIME_runs/Results/r--p-maxaccepts 3 axonomy/taxonomy.qza   --p-min-consensus 0.7 ion /fs03/hj18/Tim--o-classification /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy/taxonomy.qza                    --o-search-results /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy/search_results.qza
    elif [ "$classifier_uc" == "VSEARCH" ]; then
    qiime taqiime feature-classifier classify-consensus-vsearch /Tim_duplicate/Silva--i-query /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/derepSeq/rep-seqs.qza  duplicate/Silvan/Met--i-reference-reads /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/importDb/db_sequences.qza -table /fs03/hj18/Ti--i-reference-taxonomy /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/importDb/db_taxonomy.qza                  --p-perc-identity 0.9                    --p-query-cov 0.8                       --p-maxaccepts 100                      --p-maxrejects 100                      --p-maxhits 3                   --p-min-consensus 0.7                   --p-strand 'both' status:            --p-unassignable-label 'Unassigned'                     --p-threads 36                  --o-classification /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy/taxonomy.qza   --o-search-results /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy/search_results.qza
            --verbose
mmand else (empty) echo "Classifier Vsearch is not supported (choose between Blast and Vsearch)" fi

qiime metadata tabulate 912 Killed --m-input-file /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy/taxonomy.qza tONTIIME_r--o-visualization /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy/taxonomy.qzv

omy/taqiime taxa filter-table sults /fs03/hj18--i-table /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/derepSeq/table.qza --i-taxonomy /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy/taxonomy.qza --p-exclude Unassigned --o-filtered-table /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/derepSeq/table-no-Unassigned.qza

Command exit status: 137

Command output: (empty)

Command error: .command.sh: line 15: 18912 Killed qiime feature-classifier classify-consensus-vsearch --i-query /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/derepSeq/rep-seqs.qza --i-reference-reads /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/importDb/db_sequences.qza --i-reference-taxonomy /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/importDb/db_taxonomy.qza --p-perc-identity 0.9 --p-query-cov 0.8 --p-maxaccepts 100 --p-maxrejects 100 --p-maxhits 3 --p-min-consensus 0.7 --p-strand 'both' --p-unassignable-label 'Unassigned' --p-threads 36 --o-classification /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy/taxonomy.qza --o-search-results /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/Results/run1_test1/assignTaxonomy/search_results.qza

Work dir: /fs03/hj18/Tim_duplicate/Silvan/MetONTIIME_runs/SILVA_138/work/36/8580cb50e2c2c2f097ff666640befb

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

I have tried the following options:

Increasing memory allocation for assignTaxonomy up until 416GB
Decrease CPU threads to 6 cores (initially I was using 36 cores)
Splitting the 44 samples into two different runs (meaning 22 samples at a time)
Using SILVA 138 database pre-clustered at 99%

However, there was no luck in solving the issue.

May I know if you are able to help with this? Please let me know if you need the NextFlow configuration file or any other information from me.

Thanks in advance!

Kind regards, Timothy

MaestSi commented 6 months ago

Dear Timothy, your analysis and the attempts you made to solve the issue seem right to me. You may try further splitting the 44 samples in, for example, 4 groups of 11 samples. As an alternative, you could try either reducing the number of reads (process downsampleFastq = true and maxNumReads=10000, for example) or performing clustering with lower identity (e.g. clusteringIdentity=0.9). I suggest setting up Nextflow Tower. You just need to login at the website with your GitHub credentials and create a Token. Next, edit lines 64-68 of metontiime2.conf script to set enabled = true and add your access token.

tower {
    enabled = true
    endpoint = '-'
    accessToken = 'insert your token here'
}

After that, you will just need to login at the Nextflow Tower website, click on "Runs" and select your username. Running the whole pipeline with a subset of the reads/samples, you will be able to track the amount of RAM memory used by assignTaxonomy process and extrapolate the required amount of RAM VS number of reads. Best, SM

tjlim10 commented 6 months ago

Hi SM,

Thanks so much for your reply. I tried to re-run using 11 samples, unfortunately it still ran out of memory. I still received the same error 137.

At this stage, I am trying to do two parallel runs: 1) running with just 2 samples; 2) running with clusteringIdentity=0.9. I will keep you updated.

Thanks!

Kind regards, Timothy

tjlim10 commented 6 months ago

Hi SM,

Just an update on my "running with just 2 samples". I have managed to successfully complete the run (for the first time, yay! And it proves that there's nothing wrong with the configuration), however the memory usage for 2 samples is just way too demanding (these two samples contain 50,000 reads each). Since I have roughly 44 samples in Nanopore sequencing, may I know if it's more sustainable to just reduce the number of reads (if yes, is 10000 the optimal no. of reads between retained data and required resources)/perform clustering with lower identity instead of running in smaller batches and combining them later on? Running in smaller batches sound super time consuming to me. Happy to hear about your thoughts. Thanks!

Update: The 44 samples run with clusteringIdentity=0.9 managed to run successfully with much lower resources, and the assignTaxonomy step only consumes 157GB (much smaller than expected)! See screenshot below:

Therefore, just wondering from your point of view, is it better to reduce number of reads or perform clustering with lower identity if I'm aiming at retaining most of amount of data? Thanks!

Kind regards, Timothy

MaestSi commented 6 months ago

Dear Timothy, I would just try to analyse a couple of samples (say 2) with:

Full dataset
Dowsampling up to 10k reads per sample
Clustering at 90% (0.9) identity

I would then take a decision based on which of the two latter options looks more similar to the full dataset analysis. For this aim, you may use the genus- or species-level counts and evaluate pairwise correlation. If you do not have time to do this additional analysis, I would personally go for the downsampling strategy. Best, SM

tjlim10 commented 6 months ago

Hi SM,

Thanks so much for the suggestion. I will try to have a look and get back to you on my findings ;)

Kind regards, Timothy

tjlim10 commented 6 months ago

Hi SM,

I had a look at the comparison between downSampling and clusteringIdentity_. By just looking at the taxonomy bar plot, the downSampling % assigned is close to the original dataset; for the clusteringIdentity, the feature table looked so different, such that the unassigned proportion is higher than the assigned proportion (which is opposite of the original proportion). Hence, I agree with you that downSampling strategy is better than clusteringIdentity!

Thanks a lot for the help!

Kind regards, Timothy

MaestSi / MetONTIIME

AssignTaxonomy error exit status 137 #83