bioinfologics / w2rap-contigger

An Illumina PE genome contig assembler, can handle large (17Gbp) complex (hexaploid) genomes.
http://bioinfologics.github.io/the-w2rap-contigger/
MIT License
44 stars 14 forks source link

Read lengths and large K #44

Closed andreaswallberg closed 3 years ago

andreaswallberg commented 4 years ago

Dear developers,

I have ~40x coverage of 2x150 Illumina reads produced using 10x Chromium libraries for an organism with a complex genome (we also have long reads) and like to try w2rap-contigger. However, I don't really understand how to select a value for the parameter large K=n.

Is this value bounded by the read-length, i.e. should it be below 150 (or 300)? In other words, what are the important factors that govern the value specified for K?

bjclavijo commented 3 years ago

Hi Andres, sorry about the delay on this, we've not been keeping a close eye on issues, obviously.

Yes, you want large_K to be smaller than you read size. Important factors are: smaller than read size, you want smaller values if the graph gets disconnected, and larger values if it is too tightly connected. These are functions of gneome composition, but a bit complicated to get into details here.