Open kevfengler227 opened 2 years ago
Also, what is the size threshold or other distinguisher between a small error and structural error? I am seeing relatively small INDELs that are contained within HiFi reads that could be better handled as small error rather than triggering a local re-assembly.
For example this 72bp INDEL, it is really just a "small error" given 20 kb HiFi reads, but it a structural error.
Here is an example of a 1,165 bp INDEL that is classified as a structural error, but fails local re-assembly because of the nearby heterozygous SNP. This could easily be handled as a small scale error. Would it be possible to add a parameter to set a max value to be considered a small scale error. For 20 kb PacBio Hifi reads, coupled with minimap2 alignment, up to 3 kb INDELs could easily be handed as small scale errors.
Also, it may be a good idea to add a minimum alignment score to the minimap2 alignment. Often, in the absence of a assembled region in the assembly, reads will align to the most similar region at low alignment score and can cause small-scale errors. For example, a minimum alignment score of 10000 is a reasonable value for >15 kb HiFi reads
Below are some spurious alignments with AS < 15000 causing small scale errors. These errors also have a low p-value, so it could be addressed that way too.
Hi Maggie, This is a very interesting tool. I am thoroughly testing it in comparison to other polishing approaches I have been using, so I should have so good feedback soon. I am particularly interested to see how it handles small N-gaps introduced by BioNano hybrid scaffolding that can easily be spanned by HiFi reads.
What happened to the p-value parameter? I see it in the documentation, but not in v1.0.2. This could be very helpful to increase the quality of polishing.
Also, v1.0.2 still shows v1.0.1 as the version.
Thanks, Kevin