broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

Using pilon with RNAseq data #50

Closed suryasaha closed 7 years ago

suryasaha commented 7 years ago

Is it advisable to use pilon with RNAseq data and unspliced alignment?

I notice that the assembly size drops by 15-20% when I use pilon (v1.22) with parameters: --fix all,breaks --diploid --mingap 1. This is a eukaryotic genome with multiple individuals in the same DNA sample. Thanks

w1bw commented 7 years ago

Hi Suryasaha,

I have never tried this myself. What is your intended output? Is the assembly size dropping because Pilon is deleting the introns? That's probably to be expected.

I know one person at the Broad Institute who successfully used Pilon with Illumina RNAseq data to correct base accuracy of the exome of an assembly originally created with 454; she was able to fix a lot of spurious indels which had caused frameshifts. To do that, you would want to limit yourself to --fix bases.

suryasaha commented 7 years ago

Hello!

The goal is to fix indel/SNP errors in the exonic regions. I have tried both spliced and un-spliced alignment with similar results. So the reduction in assembly size may not be due to deletion of introns from spliced reads.

Great suggestion to use only the --fix bases option. I was using --fix all by default. I will try it out and report back. Thanks!

suryasaha commented 7 years ago

Just wanted to confirm that using --fix bases as suggested by @w1bw (instead of --fix all,breaks) resolved the issue.