GaetanBenoitDev / metaMDBG

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.
MIT License
105 stars 4 forks source link

Disabling or changing the correction step (and other performance observations) #25

Open GabeAl opened 9 hours ago

GabeAl commented 9 hours ago

Hi,

I was wondering if it is possible to disable read correction (e.g. if using hirro or another correction tool, or when pre-filtering to very high quality reads with v5 SUP basecalling, etc).

If not, how can it be controlled? I see some interesting options in the commandline that hint at something related to the correction step (e.g. --density-correction), but no way to turn it off (setting this to 0 produces empty assemblies).

I have noticed, additionally, that raising --density-assembly to a high value (tested with the latest HEAD in github) can lead to some hilarious results, including proclaiming a 6MB genome is a closed 2.3MB genome (I've never seen this kind of thing happen), and other wonky behaviors in the range between 0.02 and 0.05 (higher than that tends to fail outright).

Any guidance for using these parameters and what your expectations are for tweaking them would be helpful.

E.g. I'm able to get much better assemblies in low-depth high-accuracy ONT runs when I set the --density-assembly higher (0.01 to 0.02 range) and reduce the min overlap (e.g. to 500bp). The resulting genomes are consistent and seemingly accurate, but the defaults didn't work well in these cases.

I also saw a question about strain-level assemblies that is quite salient -- would bumping the min read identity help here? What does that do? Does this govern read realignment or initial DBG construction or the error correction step (or all/some of the above)?

I wonder if these observations suggest room for improvement or auto-selection of parameters.