broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

advice for finishing a pacbio sequel yeast genome asm #67

Closed splaisan closed 5 years ago

splaisan commented 6 years ago

Dear,

I recently assembled very high coverage sequel data using canu. I then corrected the canu raw assembly using arrow and the same pacbio sequel reads (mapped using blasr to produce the correction BAM data).

One question now is which assembly to feed to pilon for the 'final' illumina correction (I mapped the illumina reads against each assembly using BWA mem in standard settings mode).

Do I further correct the canu+arrow-output using my fresh Illumina paired-end reads or will this add bias on 'some' arrow-introduced bias.

In the same line, should I mask repeats in ly input assemblies first to prevent pilon to mess with repeats or will it handle these fine?

Finally, which pilon options should I add to obtain rich data that can be used to further improve the assembly manually? I picked the following extra arguments from the man page, is it a good choice.

java -Xmx64G -jar pilon.jar --genome arrow-polished-canu.contigs.fasta --frags illumina_mappings.bam --output pilon-canu_Sequel-arrow.contigs --changes --vcf --tracks

Any advice and comments are very welcome. Stephane