bokulich-lab / q2-assembly

QIIME 2 plugin for (meta)genome assembly.
BSD 3-Clause "New" or "Revised" License
4 stars 12 forks source link

BUG: `index-contigs` fails if any input files are empty #49

Closed gregcaporaso closed 6 months ago

gregcaporaso commented 1 year ago

If no contigs are formed for any samples during assembly, and a SampleData[Contigs] with some .fa files of size zero is therefore passed as input to index-contigs, index-contigs fails with a fairly uninformative error message:

  An error was encountered while running Bowtie2, (return code 1), please inspect stdout and stderr to learn more.

The --verbose output was more useful, but still only "warned" about an empty fasta file:

Input files DNA, FASTA:
  /scratch/gcaporaso/temp/qiime2/gcaporaso/data/b1c35261-68ee-4d73-864e-80ca50a04069/data/NEC-EF_contigs.fa
Warning: Empty fasta file: '/scratch/gcaporaso/temp/qiime2/gcaporaso/data/b1c35261-68ee-4d73-864e-80ca50a04069/data/NEC-EF_contigs.fa'
Warning: All fasta inputs were empty
Total time for call to driver() for forward index: 00:00:00
Error: Encountered internal Bowtie 2 exception (#1)
Command: /home/gcaporaso/mambaforge/envs/q2dev-20235-shotgun/bin/bowtie2-build-s --wrapper basic-0 --bmaxdivn 4 --dcv 1024 --offrate 5 --ftabchars 10 --threads 40 /scratch/gcaporaso/temp/qiime2/gcaporaso/data/b1c35261-68ee-4d73-864e-80ca50a04069/data/NEC-EF_contigs.fa /scratch/gcaporaso/temp/q2-Bowtie2IndexDirFmt-35dkmvkk/NEC-EF/index
Traceback (most recent call last):
  File "/home/gcaporaso/4-git-repos/qiime2/q2-assembly/q2_assembly/bowtie2/indexing.py", line 50, in _index_seqs
    run_command(cmd, verbose=True)
  File "/home/gcaporaso/4-git-repos/qiime2/q2-assembly/q2_assembly/_utils.py", line 28, in run_command
    subprocess.run(cmd, check=True)
  File "/home/gcaporaso/mambaforge/envs/q2dev-20235-shotgun/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bowtie2-build', '--bmaxdivn', '4', '--dcv', '1024', '--offrate', '5', '--ftabchars', '10', '--threads', '40', '/scratch/gcaporaso/temp/qiime2/gcaporaso/data/b1c35261-68ee-4d73-864e-80ca50a04069/data/NEC-EF_contigs.fa', '/scratch/gcaporaso/temp/q2-Bowtie2IndexDirFmt-35dkmvkk/NEC-EF/index']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gcaporaso/mambaforge/envs/q2dev-20235-shotgun/lib/python3.8/site-packages/q2cli/commands.py", line 468, in __call__
    results = action(**arguments)
  File "<decorator-gen-736>", line 2, in index_contigs
  File "/home/gcaporaso/mambaforge/envs/q2dev-20235-shotgun/lib/python3.8/site-packages/qiime2/sdk/action.py", line 274, in bound_callable
    outputs = self._callable_executor_(
  File "/home/gcaporaso/mambaforge/envs/q2dev-20235-shotgun/lib/python3.8/site-packages/qiime2/sdk/action.py", line 509, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/gcaporaso/4-git-repos/qiime2/q2-assembly/q2_assembly/bowtie2/indexing.py", line 85, in index_contigs
    _index_seqs(contig_fps, str(result), common_args, "contigs")
  File "/home/gcaporaso/4-git-repos/qiime2/q2-assembly/q2_assembly/bowtie2/indexing.py", line 52, in _index_seqs
    raise Exception(
Exception: An error was encountered while running Bowtie2, (return code 1), please inspect stdout and stderr to learn more.

Plugin error from assembly:

  An error was encountered while running Bowtie2, (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

I came across this because I had a couple of control samples which had very few (<10) demultiplexed sequences in my input to assemble-megahit, and these unsurprisingly didn't form any contigs. When I ran index-contigs I got the error.

I'm not sure what the best pathway forward is for this - at the very least we probably want a more informative error message, but we also might want a way to filter the SampleData[Contigs] so the user doesn't have to generate contigs again (which can take a while). I got around it this time by filtering my input to assemble-megahit to drop the two samples that were causing problems with qiime demux filter.

EDIT: I just hit this again, on a different data set. (Aug 21 2023)

gregcaporaso commented 9 months ago

@colinvwood, @misialq, looks like this was addressed with a more informative error message in #63. What's the work-around if a user gets that error message? I think we should have a work-around that doesn't require re-running assembly, and once we have that this can be closed out.

misialq commented 8 months ago

Hey @gregcaporaso, should we add a filter-contigs action which would allow us to do filtering similar to the demux-filter action? We could also then add an option to filter out all empty contig files (rather than provide a list of filtering criteria). What do you think?

gregcaporaso commented 8 months ago

I like that idea @misialq.

misialq commented 6 months ago

Closing this - there is a new issue to track the development of contig filtering (#84).