no difference between hybrid mode and short read only

gianfilippo commented 2 years ago

Description of bug

Hi,

I am trying to run coronaSPAdes on a sample we are interested in.

Initially I had only Illumina single-end reads available. The original FASTQ was first mapped to SARS-Cov-2 ref, soft clip primer sequences and sorted. The resulting BAM was them converted to FASTQ. I used this file as input to coronaSPAdes and tried default parameters first and then setting a range for k=37,49,61,73,85,97.

I am not sure how to set the k parameter, yet. Suggestions are very welcome !

Anyway, I then received Nanopore sequencing data (2 technical replicates) for the same sample, so I tried to run coronaSPAdes in hybrid more by adding "--nanopore $nanoFASTQ1 -nanopore $nanoFASTQ2" to the command line. Here I left the k=37,49,61,73,85,97

The resulting contigs.fasta and scaffolds.fasta are the same to the run without the Nanopore reads. I was expecting some difference.

I have to add that I used the original FASTQ without any further processing (i.e. align_trim, etc) like for the Illumina reads.

Is this expected or am I doing something wrong ?

Needless to say, I am new to this kind of analysis

Thanks

spades.log

params.txt

SPAdes version

3.15.1

Operating System

Linux

Python Version

3.8.6

Method of SPAdes installation

pre-installed on cluster, so I do not know

No errors reported in spades.log

[X] Yes

asl commented 2 years ago

Hello

I am not sure how to set the k parameter, yet. Suggestions are very welcome !

If you're unsure what to do – simply stick to the default. Otherwise the results might be surprising

The resulting contigs.fasta and scaffolds.fasta are the same to the run without the Nanopore reads. I was expecting some difference.

First of all, it seems only one 1 (one) nanopore read aligned to the assembly graph. Are you sure that the reads are from this genome?

Next, for this particular version of SPAdes / coronaSPAdes (not the latest!), the output is in gene_clusters.fasta. Have you assembled the complete viral genome?

gianfilippo commented 2 years ago

Hi,

thanks for the prompt reply.

I found the Nanopore mapping results in the log file. The reads are definitely from this genome, as I used them with a reference based pipeline to get a consensus sequence using then Artic pipeline. From the log file it looks like the main issue is with reads being shorter than 500. In this sample I was given, read length is in the 164-524 range. Is there a way to lower the 500 threshold ? or should I rather use the Nanopore data without the --nanopore flag ? o rshould I simply just use the single-end Illumina reads I have ?

Thanks

asl commented 2 years ago

Still, have you assembled the complete genome (in gene_clusters.fasta file)?

gianfilippo commented 2 years ago

With default parameters, I get two complementary contigs (19371 and 10331) that combined span 29702 bp.

On Mon, Jan 24, 2022 at 3:26 PM Anton Korobeynikov @.***> wrote:

Still, have you assembled the complete genome (in gene_clusters.fasta file)?

— Reply to this email directly, view it on GitHub https://github.com/ablab/spades/issues/898#issuecomment-1020516049, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPSFVCYR4YXDB2I6R4XHNDUXWYWXANCNFSM5MTY7AXQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

gianfilippo commented 2 years ago

is here a way to bypass the 500 threshold ?

ablab / spades