broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

Polish a large genome with Pilon #170

Open enriquepola1996 opened 4 months ago

enriquepola1996 commented 4 months ago

Hello everyone,

I'm trying to polish a large genome (3Gb) with Pilon but I'm having problems with RAM. I read that some people choose to split the genome to deal with RAM, so I would like to try this alternative. However, I have a question about how I can separate the genome and finally join the outputs of each independent polish. Does anyone have experience with this?. My genome is somewhat fragmented (10,000 scaffolds).

I would appreciate any comments.

SergeWielhouwer commented 3 months ago

You would probably want to try out the--targets argument and run Pilon multiple times (e.g. 500 pilon jobs in parallel) by providing each scaffold name to --targets. Afterwards you can concatenate all the polished scaffolds together. It is likely not needed to first split the input BAM files.

I haven't tried it out myself, so hopefully someone from the Pilon team can share some thoughts on this.

enriquepola1996 commented 3 months ago

Hello @SergeWielhouwer

Thank you very much for the help, I'm going to try it. At the moment I'm trying Hapo-G and it seems to be going well.