broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

fix breaks should introduce new gaps only on majority #28

Closed tolot27 closed 7 years ago

tolot27 commented 7 years ago

In my assembly there is a genomic region with a coverage of arround 48X and there are fifteen chimeric reads which do not map completely to that region. Unfortunately, pilon breaks this region and introduces a gaps there. Hence, the remaining two-third reads are not correctly aligned anymore.

I suggest that only breaks will be fixed, if the majority of reads support the suggested fix.

w1bw commented 7 years ago

It's hard to diagnose without digging into the data, but let me at least explain how this works. When Pilon detects something that may be a local misassembly, it attempts to find reads from the region (along with their mates, if paired) and does a mini-reassembly. So it's not particularly counting read evidence; it's building a kmer graph from the reads, and pruning small-minority branches which are likely caused by sequencing errors. If it can re-assemble the suspicious region and get continuity across it, it will either leave it alone or open a gap (if "--fix breaks" is specified).

I don't have "breaks" turned off by default for assembly improvement because it can be fooled into falsely opening gaps, though it is on by default for variant calling because it's an important way of detecting large insertions or deletions.