genomic-medicine-sweden / jasen

Bacterial typing pipeline for clinical NGS data. Written in NextFlow, Python & Bash.
GNU General Public License v3.0
9 stars 9 forks source link

add subsampling of reads before de novo assembly #160

Open LordRust opened 1 year ago

LordRust commented 1 year ago

Since too many reads just introduce more error edges in the assembly graph, we should add a step for subsampling beofre assembly. seqtk would be the obvious speedy candidate for doing this, but there are others as well. Aside from producing better assemblies, it would also speed up the running time of course.

For regular genomic data I think 200x would be good starting point.

ryanjameskennedy commented 10 months ago

Just out of interest, I ran into this problem recently and SKESA gave an error saying:

Invalid file <expected_read_filename>