Since too many reads just introduce more error edges in the assembly graph, we should add a step for subsampling beofre assembly. seqtk would be the obvious speedy candidate for doing this, but there are others as well. Aside from producing better assemblies, it would also speed up the running time of course.
For regular genomic data I think 200x would be good starting point.
Since too many reads just introduce more error edges in the assembly graph, we should add a step for subsampling beofre assembly. seqtk would be the obvious speedy candidate for doing this, but there are others as well. Aside from producing better assemblies, it would also speed up the running time of course.
For regular genomic data I think 200x would be good starting point.