broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

[Question] PacBio reads and Pilon #34

Closed RxLoutre closed 7 years ago

RxLoutre commented 7 years ago

Hello everyone,

I am currently working on an de novo assembly project of the Drosophila suzukii genome, which is known to be highly polymorphic. We have long reads (Pacific Biosciences technology). I would like to polish my assemblies (I've used both falcon and canu assembler) in order to annotate them.

We possess the PacBio reads used for the assembly as well as Illumina reads previously produced by an other lab. The Illumina reads have been produced on the same Drosophila strain as the PAcBio reads BUT the strain was less isogenic at the time.

I would like to know which method you recommend most for polishing. More precisely my question is the following : to have the most accurate consensus sequence, is it better to use Pilon with the long PacBio reads we used for the assembly, despite their high error rate ? Or is it better to take Illumina short reads from the previous assembly even if they were not used for the assembly ? Our assembly seems to contain a fair amount of repetitive sequences and I am concerned that the short Illumina reads may perform poorly for those regions.

What do you think about it ?

Cheers,

Roxane