Nextomics / NextPolish

Fast and accurately polish the genome generated by long reads.
GNU General Public License v3.0
205 stars 28 forks source link

Polish only INDELs or only SNPs, and polishing thresholds #95

Open mmontonerin opened 2 years ago

mmontonerin commented 2 years ago

Hi, I have tried NextPolish, and oveall I am happy with it, but I miss a bit pore possibilities to select what to polish in order to trust what is doing to the de novo genome assemblies I am working with.

One functionality that I feel I miss in NextPolish is the possibility to fix either only INDELs or only SNPs, depending on the type of data that is being used. For example, I have a set of short reads that I would want to use to only correct INDELs, as many SNPs could be just normal heterozygous sites, in different proportions in different datasets.

I also miss the possibility to be a bit more conservative in polishing, and be able to select a certain depth or quality threshold for a position to be polished.

Do you plan to implement any of these functionalities in the future?

moold commented 2 years ago

Hi, first, thank you for your good suggestions. However, SNP and INDEL are hard to distinguish for NextPolish, because NextPolish correct error-bases using kmers, so NextPolish does not distinguish between SNP and INDEL. For heterozygous kmer, NextPolish selects the kmer with the most counts as the corrected kmer.

BTW, I will consider your suggestion and maybe add some extra functions/parameters in the future.