gbouras13 / plassembler

Program to quickly and accurately assemble plasmids in hybrid and long-only sequenced bacterial isolates
MIT License
50 stars 3 forks source link

Plassembler stops at Processing Sam/Bam Files and extracting Fastqs #53

Open ayoraind opened 4 months ago

ayoraind commented 4 months ago

Hi @gbouras13,

Many thanks for your excellent tool. I am trying to implement Plassembler (v1.6.2) within one of my in-house Nextflow pipelines on a Linux machine. Plassembler analysis ran successfully for one of two genomes of interest. For the second genome, Plassembler stopped at the processing Sam/Bam Files and extracting Fastqs step. Kindly find the error message below.

 2024-05-10 17:24:08.037 | INFO     | plassembler:run:793 - Mapping short reads.
  2024-05-10 17:24:08.037 | INFO     | plassembler.utils.external_tools:run_to_stdout:67 - Started running minimap2 -ax sr -t 1 15059/flye_renamed.fasta 15059/trimmed_R1.fastq 15059/trimmed_R2.fastq ...
  2024-05-10 17:25:00.741 | INFO     | plassembler.utils.external_tools:run_to_stdout:69 - Done running minimap2 -ax sr -t 1 15059/flye_renamed.fasta 15059/trimmed_R1.fastq 15059/trimmed_R2.fastq
2024-05-10 17:25:00.742 | INFO     | plassembler:run:800 - Processing Sam/Bam Files and extracting Fastqs. 
Traceback (most recent call last):
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/bin/plassembler", line 10, in <module>
      sys.exit(main())
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/lib/python3.9/site-packages/plassembler/__init__.py", line 1666, in main
      main_cli()
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/lib/python3.9/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/lib/python3.9/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/lib/python3.9/site-packages/plassembler/__init__.py", line 804, in run
      extract_long_fastqs_slow_keep_fastqs(outdir, samfile, plasmidfastqs)
    File "/MIGE/04_PROJECTS/DAIKOS/longread_analysis/work/conda/plassembler-2a63b87dbbb2cd95edb41776d561fd4c/lib/python3.9/site-packages/plassembler/utils/sam_to_fastq.py", line 92, in extract_long_fastqs_slow_keep_fastqs
      "".join(chr(q + 33) for q in quality) + "\n"
  TypeError: 'NoneType' object is not iterable

The bash command;

#!/bin/bash -ue
plassembler run \
        \
-d plassembler_db \
-l 15059.fastq.gz \
-1 15059_1.fastq.gz \
-2 15059_2.fastq.gz \
-m 1  \
-p 15059  \
-o 15059 \
--keep_fastqs \
--keep_chromosome  \
       -r \

My guess is that there are no fastqs to be extracted by Plassembler (using the --keep_fastqs argument). Is this correct? If so, does this mean that no plasmids are present in this genome? If truly there are no plasmids, is it then possible that an empty fastq file (e.g., within the plasmid_fastqs directory) is produced and the run continues with a warning message in the logs rather than stopping altogether (thinking Nextflow/Snakemake application)?

ayoraind commented 4 months ago

I just did a quick check by running the bash script outside the Nextflow pipeline, with or without the --keep_fastqs argument. Plassembler ran successfully, and showed that there at least 2 circularized plasmids

>1 length=263626 plasmid_copy_number_short=1.04x plasmid_copy_number_long=1.14x
>2 length=43380 plasmid_copy_number_short=0.75x plasmid_copy_number_long=0.47x circular=true
>3 length=42020 plasmid_copy_number_short=1.4x plasmid_copy_number_long=1.36x
>4 length=13841 plasmid_copy_number_short=4.31x plasmid_copy_number_long=1.79x circular=true
>5 length=11136 plasmid_copy_number_short=1.03x plasmid_copy_number_long=0.62x

So, I think it has something to do with my Nextflow pipeline.

ayoraind commented 4 months ago

I found the cause of the error. The error occured only when I set the argument -m to 1. If -m is not specified (default is 500), the pipeline runs successfully. Is it possible to make the run continue with a warning message in the logs rather than stopping altogether (for the sake of Nextflow/Snakemake)?

gbouras13 commented 4 months ago

Hi @ayoraind ,

Thanks so much for this detailed bug report - I'll put in a fix soon for sure (probably will just skip all really short reads that seemingly cause this issue)

George

ayoraind commented 4 months ago

Hi @gbouras13,

Thank you very much.