broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

chunk the genome in Pilon #47

Closed bostanict closed 7 years ago

bostanict commented 7 years ago

Hi I have a question,

Thanks for the useful pipeline, a quick question.

Is it advised to split the genome in different files and run pilon on each independently to overcome the memory limit issue for large genomes? the mapping can be done on the whole genome to avoid any false mapping based on the score and multiple mapping but is it gonna have any effect on the base corrections if we split the chromosomes and run the pilon with the whole mapping for each chromosome or contig one by one?

Thanks a lot

w1bw commented 7 years ago

yes, doing the alingments to the whole genome is best, then one can run pilon on subsets of input fasta elements by either splitting the input fasta or by using the --targets option to specify a list to process. If you are doing "--fix bases" to just polish up SNPs and indels, then that should be equivalent. If you are actually trying to fill gaps and fix local misassemblies, then the memory requirements may skyrocket, because it wants to use all the non-properly paried reads for potential help in the reassembly. One possibility is to use "--nostrays", which will decrease memory usage considerably but also be somewhat less effective at reassembly.