broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
338 stars 60 forks source link

Multimapped read behaviour #87

Open francicco opened 5 years ago

francicco commented 5 years ago

Hi,

I'm trying to correct a Pacbio assembly performed with CANU. The idea is to perform a first correction round with Pacbio reads using Arrow (3 iterations), in order to correct SNPs, following with Pilon and Illumina reads to fix indels (5 itarations, my coverage is around 25-fold). I also have RNA-seq data which I may use. To monitor the behaviour of the whole process I map onto each iteration the Genomic illumina reads and RNA-seq reads.

This is what happens: dna mapping This is the mapping for the genomic reads

rna mapping And this is for the RNA.

Of course the effect is higher in the DNA reads because it's more extended.

What I found weird, and I don't know how to explain it the effect of the Arrow first and Pilon after. There a substantial increase of multiple mapping with arrow with a strong decrease when pilon is used.

Is that normal?

The other question I have is: am I doing right in correcting only indels with Pilon? Should I also correct SNPs and maybe using RNA data to only corrent indels?

Thank for your help F