Open LordRust opened 1 year ago
Just out of interest, I ran into this problem recently and SKESA gave an error saying:
Invalid file <expected_read_filename>
Example of how to run:
seqtk sample -s100 read1.fq 10000 > sub1.fq
seqtk sample -s100 read2.fq 10000 > sub2.fq
Since too many reads just introduce more error edges in the assembly graph, we should add a step for subsampling beofre assembly. seqtk would be the obvious speedy candidate for doing this, but there are others as well. Aside from producing better assemblies, it would also speed up the running time of course.
For regular genomic data I think 200x would be good starting point.