0x002A / MuCHSALSA

Multi-Core Hybrid Short- And Long-read Sequence Assembler
GNU General Public License v3.0
7 stars 1 forks source link

Run time for metagenome #11

Open microEcology opened 1 year ago

microEcology commented 1 year ago

We're testing this implementation of the LazyB, MuCHSALSA, on a metagenome with both Illumina and Pacbio reads. We're testing these hybrid assemblies for soil metagenomes. The first sample is still assembling, with 3.9Gb short reads and 5.3Gb long reads, and it's been running for more than a month, is this expected? The same sample was successfully assembled with Unicycler.

This is the command called: sh pipeline.sh 50 90 Sample7 ../reads_/Sample7R1.fq ../reads/Sample7R2.fq ../reads/Sample7_99.fasta Sample7_muchsalsa

Last Abyss log is this (last update to this file was on 22-08):

Loaded 1853341012 k-mer. At least 104 GB of RAM is required. Minimum k-mer coverage is 56 Using a coverage threshold of 1... The median k-mer coverage is 1 The reconstruction is 1853341012 The k-mer coverage threshold is 1 Setting parameter e (erode) to 2 Setting parameter E (erodeStrand) to 0 Setting parameter c (coverage) to 2 Finding adjacent k-mer...

Is this extremely long running time expected? Would this assembler not work for metagenomes?

TGatter commented 1 year ago

Thank you for your interest in our software!

Our pipeline was not development with metagenomes in mind. The initial kmer trimming steps (based on Jellyfish) are designed to trim the kmer peaks of single genomes. I would therefore presume that such trimming can lead to undesired results in meta-genomic data.

This behaviour is nevertheless not expected. The error you encountered appears to be within ABySS if I understood your question correctly. Have you attempted ABySS assemblies without pre-filter? Did you find any other errors? Is AbySS still running or on standby?

microEcology commented 1 year ago

Thank you for replying!

There isn't any error message, and is still running, it's just taking too long...

We didn't try ABySS alone yet. Do you think is still worth waiting or would be better to move on to ABySS without pre-filter? Or maybe using some other values for kmer trimming would solve this?

0x002A commented 1 year ago

Any news on this issue? @TGatter, @microEcology?

microEcology commented 1 year ago

I tried to test ABySS standalone, and with no jellyfish or kmer filtering, and I ran into the same problem. The assembly was running for a couple of weeks with no progress and no errors either, and stuck on the same step, so I gave up using it. I'm not sure yet what was the issue though.

TGatter commented 1 year ago

There are several components on the tool that might cause problems if we have large near complete subgraphs in the assembly graphs, which might be a possible complication for meta genomes. As it stands, it is designed for sparse assembly data. We plan to start a project on an artificial metagenome later this year. I will report here if we find something that might be of help to you.