DeniRibicic / q2ONT

Bash pipeline for analysis of ONT full-length 16S sequences in QIIME2
27 stars 12 forks source link

Memory issues and data volume #2

Closed nbargues closed 4 years ago

nbargues commented 4 years ago

Hi, I run your pipeline with a small subset of my ONT data and it seems to work well. Next I try to run it with my full run's data and when it comes to assign taxonomy,

qiime vsearch cluster-features-open-reference \ --i-table 6.1_uchime-ref-out/table-nonchimeric-wo-borderline.qza \ --i-sequences 6.1_uchime-ref-out/rep-seqs-nonchimeric-wo-borderline.qza \ --i-reference-sequences $reference_seqs \ --p-perc-identity 0.85 \ --o-clustered-table 6.2_table-op_ref-85.qza \ --o-clustered-sequences 6.2_rep-seqs-op_ref-85.qza \ --o-new-reference-sequences 6.2_new-ref-seqs-op_ref-85.qza \ --p-threads $threads

I have a memory error ( 32 core and 126Go Mem ).

Then I have done a subsampling and keep only 40% of each read per sample and re-run the pipeline. And I still have the same issues.

Do you have an idea for optimize this step ? What is the mean volume of your data?

PS: For info, the volume of my 12 samples of my run after subsampling and trimming is 12 x 300Mo = 3,6Go

PS2: full traceback of the error log :

_Running external command line application. This may print messages to stdout and/or stderr.

The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /tmp/tmp9bmzjk7j --id 0.85 --db /tmp/qiime2-archive-ew75tn5v/9f4b9a95-e8c4-45be-a980-6faaa9b857c7/data/dna-sequences.fasta --uc /tmp/tmpqsl93h8o --strand plus --qmask none --notmatched /tmp/tmp72o2x5lw --threads 24

vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 32 cores https://github.com/torognes/vsearch

Reading fiand subsamplingle /tmp/qiime2-archive-ew75tn5v/9f4b9a95-e8c4-45be-a980-6faaa9b857c7/data/dna-sequences.fasta 100% 521145303 nt in 369953 seqs, min 900, max 2961, avg 1409 Masking 100% Counting k-mers 100% Creating k-mer index 100% Searching 100% Matching query sequences: 844922 of 2092876 (40.37%) Running external command line application. This may print messages to stdout and/or stderr. The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --sortbysize /tmp/tmp72o2x5lw --xsize --output /tmp/q2-DNAFASTAFormat-anta19n4

vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 32 cores https://github.com/torognes/vsearch

Reading file /tmp/tmp72o2x5lw 100% 1747135600 nt in 1247954 seqs, min 1400, max 1400, avg 1400 Getting sizes 100% Sorting 100% Median abundance: 1 Writing output 100% Running external command line application. This may print messages to stdout and/or stderr. The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --cluster_size /tmp/tmpa7j30vmp --id 0.85 --centroids /tmp/q2-DNAFASTAFormat-xiom9wn4 --uc /tmp/tmpwz5to5f2 --qmask none --xsize --threads 24

vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 32 cores https://github.com/torognes/vsearch

Reading file /tmp/tmpa7j30vmp 100% 1747135600 nt in 1247954 seqs, min 1400, max 1400, avg 1400 Sorting by abundance 100% Counting k-mers 100% Clustering 100% Sorting clusters 100% Writing clusters 100% Clusters: 512147 Size min 1, max 55355, avg 2.4 Singletons: 482081, 38.6% of seqs, 94.1% of clusters Traceback (most recent call last): File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/commands.py", line 327, in call results = action(**arguments) File "</home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/decorator.py:decorator-gen-126>", line 2, in cluster_features_open_reference File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable output_types, provenance) File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py", line 502, in _callableexecutor prov = provenance.fork(name, output) File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/archive/provenance.py", line 438, in fork forked.add_ancestor(alias) File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/archive/provenance.py", line 167, in addancestor shutil.copytree(str(grandcestor), str(destination)) File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/shutil.py", line 359, in copytree raise Error(errors) shutil.Error: [('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/metadata.tsv', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/metadata.tsv', '[Errno 28] No space left on device'), ('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/action.yaml', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/action.yaml', '[Errno 28] No space left on device'), ('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/metadata.yaml', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/metadata.yaml', '[Errno 28] No space left on device'), ('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/citations.bib', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/citations.bib', '[Errno 28] No space left on device')]

DeniRibicic commented 4 years ago

@nbargues have just seen this now. I guess you are past the memory issue since you haven't mention it in PM today when we have chatted? You can gladly elaborate what you have done to solve it?

nbargues commented 4 years ago

Clustering OTU or assign taxonomy with qiime2 need a lot of CPU and memory. Also if this issue occurs, you need to redirect your tmp/ directory to a directory with more volume (>100go) .

export TMPDIR=/your/new/tmp/dir

Cheers

ahfitzpa commented 2 years ago

Hi I am having very similar error messages as above. I have redirected outputs to a temp file that the admin of the HPC is certain contain sufficient space for output files but this has not changed the error messages received. I had assigned 50 CPU for only the vsearch section of the script. My uchime rep-seqs and table input are 2GB and 1.2 GB respectively.

Please type the following command to load the qiime2:  "source activate qiime2-2021.2"
 When finished please type the following command to unload the qiime2 environment:  "conda deactivate"
vsearch v2.7.0_linux_x86_64, 376.3GB RAM, 88 cores
https://github.com/torognes/vsearch

Reading file /tmp/qiime2-archive-5su6rx31/3f043775-44c2-4702-9df5-ec5bdc60860f/data/dna-sequences.fasta 100%
34084 nt in 66 seqs, min 193, max 560, avg 516
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching query sequences: 1667873 of 8803583 (18.95%)
vsearch v2.7.0_linux_x86_64, 376.3GB RAM, 88 cores
https://github.com/torognes/vsearch

Reading file /tmp/tmp9q7ha2jn 100%
2712046253 nt in 7135710 seqs, min 100, max 1000, avg 380
Getting sizes 100%
Sorting 100%
Median abundance: 1
Writing output 100%
vsearch v2.7.0_linux_x86_64, 376.3GB RAM, 88 cores
https://github.com/torognes/vsearch

Reading file /tmp/tmpetup09yi 100%
2712046253 nt in 7135710 seqs, min 100, max 1000, avg 380
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 1786545 Size min 1, max 78892, avg 4.0
Singletons: 1624375, 22.8% of seqs, 90.9% of clusters
Traceback (most recent call last):
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/distutils/file_util.py", line 57, in _copy_file_contents
    fdst.write(buf)
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-194>", line 2, in cluster_features_open_reference
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    output_types, provenance)
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 484, in _callable_executor_
    outputs = self._callable(scope.ctx, **view_args)
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 358, in cluster_features_open_reference
    merged_rep_seqs, = merge_seqs(data=[rep_seqs, de_novo_seqs])
  File "<decorator-gen-602>", line 2, in merge_seqs
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    output_types, provenance)
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 414, in _callable_executor_
    prov = provenance.fork(name)
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/core/archive/provenance.py", line 430, in fork
    forked = super().fork()
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/core/archive/provenance.py", line 331, in fork
    distutils.dir_util.copy_tree(str(self.path), str(forked.path))
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/distutils/dir_util.py", line 159, in copy_tree
    verbose=verbose, dry_run=dry_run))
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/distutils/dir_util.py", line 159, in copy_tree
    verbose=verbose, dry_run=dry_run))
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/distutils/dir_util.py", line 159, in copy_tree
    verbose=verbose, dry_run=dry_run))
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/distutils/dir_util.py", line 163, in copy_tree
    dry_run=dry_run)
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/distutils/file_util.py", line 151, in copy_file
    _copy_file_contents(src, dst)
  File "/install/software/restart/py3/qiime2/envs/qiime2-2021.2/lib/python3.6/distutils/file_util.py", line 60, in _copy_file_contents
    "could not write to '%s': %s" % (dst, e.strerror))
distutils.errors.DistutilsFileError: could not write to '/tmp/qiime2-provenance-doq9s_o4/artifacts/d7d61ddd-a713-457e-9dd6-d0df74977f69/action/metadata.tsv': No space left on device

Plugin error from vsearch:

  could not write to '/tmp/qiime2-provenance-doq9s_o4/artifacts/d7d61ddd-a713-457e-9dd6-d0df74977f69/action/metadata.tsv': No space left on device