broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

[Question] Impact of kmer size (--K) in correction #59

Closed ghost closed 5 years ago

ghost commented 6 years ago

Assembling 6mb bacterial genomes with illumina short read data 2X150. I've tried running pilon with different kmer values (21,25,27,41,47,49,81,85,87). So far smaller K values introduce more changes in terms of deleting sequence from the assembly.

Is exploring kmer values other than the default worth it?

w1bw commented 5 years ago

I'm cleaning out old tickets, and I see this never got a response.

I chose the default kmer size for reassembly based on a number of experiments using illumina paired-end and mate pair data, so the default should be close to optimal. If there are tandem repeats slightly larger than the default K of 47, you might wan to try raising it a bit, but not a lot of experimentation has been done.