broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

run pilon on subset of whole fasta #3

Closed MichelMoser closed 8 years ago

MichelMoser commented 8 years ago

Hello,

Is it possible to run pilon on only a subset of all sequences of a fasta file by providing them with --genome option or would i have to split the bam files as well to match the subset of sequences?

would something like this work?:

pilon.jar --verbose --genome subset.fasta --frags paired.whole_genome.bam --tracks --vcf --output test.pilonpolished

Thank you, Michel

w1bw commented 8 years ago

Yes, that will work, and there's also a --targets option which can contain a list of fasta elements to process (either on the command line or in a file) which should be equivalent to a subset fasta input.

It will still scan the entire bams to gather stats prior to processing; you would have to subset the bams to make that faster. However, if you do that, you will want to include all reads in which either member of the pair maps to something in your input subset.

Good luck!

--bruce