Closed nhartwic closed 3 years ago
Echoing this request. I am currently trying to polish a 500 Mbp genome with 30x coverage and have run out of memory on a 3000GB HPC node.
Memory usage is mostly driven by the size of the genome, though it depends on whether pilon is trying to do things which require local reassembly (the default). When reassembly is on, it must keep track of all read pairs which aren't aligned near one another in order, what it calls "strays", and that can create a very large in-memory data structure proportional coverage; it all depends on how well the alignments match. If you are only doing base polishing (i.e., --fix bases), the memory requirements are far less.
Thanks for the information. Makes sense.
Memory usage in the order of 3000GB + seems excessive.
Can anyone state the algorithmic memory usage of Pilon? What factors influence memory usage and to what degree? For the sake of argument, assume K coverage and that reads map uniformly and uniquely. Does reference size matter? Does reference contiguity matter? Does number of threads matter? How does memory usage scale with read length and depth?
Is this information published anywhere?