Input requirement for hybrid assembly

marinachen commented 2 months ago

Description of bug

Hi, I was running hybrid assembly with Illumina short reads + PacBio ccs. My short reads are merged into a single fastq after QC pipeline, consisting of both paired and orphan reads, so I specified -s because that was the only way it could run. It was running fine until it had the below error when SPAdes started to first run k21 assembling.

The command I was running was: /n/home13/marinachen/.conda/envs/spades/bin/spades.py --meta --pacbio /n/holystore01/LABS/huttenhower_lab/Users/mchen/data/PB_MGX/use/Soil_pool.hifi_reads.fastq.gz -s /n/holystore01/LABS/huttenhower_lab/Users/mchen/data/Illumina_MGX_forPB/Clean_data/Soil_Pool_S52_L001.fastq -o /n/holystore01/LABS/huttenhower_lab/Users/mchen/outputs/PB_MGX/hybrid_assembly/Soil_hybrid

And the error message was:

/0:00:00.007 1M / 32M ERROR General (pipeline.cpp : 216) Sorry, current version of metaSPAdes can work either with single library (paired-end only) or in hybrid paired-end + (TSLR or PacBio or Nanopore) mode.

It looked like there was an issue with recognizing the input files for hybrid paired-end + PacBio mode? Thank you for any help!

spades.log

params.txt

SPAdes version

4.0.0

Operating System

Cannon HPC (linux)

Python Version

3.10.9

Method of SPAdes installation

conda

No errors reported in spades.log

[ ] Yes

asl commented 2 months ago

My short reads are merged into a single fastq after QC pipeline, consisting of both paired and orphan reads.

Don't do that. You need to provide proper paired-end dataset: either interleaved or in two separate files.

Sorry, current version of metaSPAdes can work either with single library (paired-end only) or in hybrid paired-end + (TSLR or PacBio or Nanopore) mode.

This is expected. You need to have a paired-end library, not a single-end one (-s).

See https://ablab.github.io/spades/input.html#paired-read-libraries for more information

marinachen commented 2 months ago

Hi, thank you for a quick reply! Do you mean I have to use either --pe1-1 R1.fastq --pe1-2 R2.fastq or --interleaved? WRT to the former, would both files have to be exact the same number of reads and exactly paired? Because some of my reads lost mates for quality or contamination during QC. Thank you again!

asl commented 2 months ago

Yes, you need to have a proper paired-end dataset (left reads correspond to right ones). SPAdes has no idea how to figure out which reads lost their mates. In general, you need to use paired-end aware QC procedure

marinachen commented 2 months ago

Thank you very much! Would you recommend just filtering out reads to retain only paired ones in this case?

asl commented 2 months ago

Thank you very much! Would you recommend just filtering out reads to retain only paired ones in this case?

up to you. If you know how to do this reliably.

marinachen commented 2 months ago

Okay thank you so much for your help!

ablab / spades