Handling of single "N" characters in reference genome

broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool

GNU General Public License v2.0

338 stars 60 forks source link

Hi folks,

Thanks for this great tool! I'm polishing a genome which contains a number of single N characters as ambiguous bases, and I'm confused about how pilon considers these. For many of these there should be good support to correct this to an A, C, T, or G, but these aren't being touched by my current attempts. Ideas? Pilon is correcting other ambigious bases (e.g. R, Y, K) to the correct base, but is ignoring Ns.

The command I'm running is: java -Xmx120g -jar ~/software/anaconda2/pkgs/pilon-1.22-1/share/pilon-1.22-1/pilon-1.22.jar \ --genome ref.fasta \ --frags aln.sorted.bam \ --unpaired u.sorted.bam \ --changes --vcf --tracks \ --threads 16 \ --fix bases,amb \ --outdir pilon_02

broadinstitute / pilon

Handling of single "N" characters in reference genome #76