broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

Question: Range of indels sizes that are corrected by Pilon #40

Closed chad388 closed 7 years ago

chad388 commented 7 years ago

I am wondering if there is any upper limit to the size of indels that are corrected by Pilon? I am using an alignment of 40x of 2 X 125bp paired reads with an insert size of 350bp as input into Pilon to correct residual indels after running quiver on a human PacBio assembly. Pilon version 1.21 is being used with the --fix all

w1bw commented 7 years ago

There's no inherent size limit; it depends on a bunch of things. There are two fundamental ways pilon corrects indels:

1) via alignment pileups. this is generally limted by how large of indels short-read aligners are able to tolerate and still generate an alignment, often tens of bases.

2) via local reassembly: the limitation here is more coverage and insert size, i.e., how far the reads reach into a large insertion. It's not at all unusual for Pilon to be able to capture a gap or large insertion of > 1KB with mate pair libraries, but it's going to be hard to get much further than your insert sizes.

Large deletions are easier if they are clean; sometimes it can be confusing if there are repeats on the flanks of large deletions.

Finally, Pilon is less adept at doing large indels on diploid genomes, and you mentioned you were looking at human data. More work could be done to improve Pilon for this case, but I don't have any plans to tackle that soon, and it's probably not going to handle large heterozygous indels well.

In any case, good luck!

chad388 commented 7 years ago

Thank you for the explanation. This is very helpful!