Closed HeQSun closed 4 years ago
Hi Hequan,
High memory requirement is probably because you did not chunk the reads into smaller pieces (please see https://github.com/CMU-SAFARI/Apollo#set-of-reads ). If there are too long reads in your read set, Apollo may request may allocate a large memory space just to handle these reads. To prevent this from happening, we suggest chunking the reads into smaller pieces and then align these reads to the assembly. We have a very simple script that almost achieves what I just described:
https://github.com/CMU-SAFARI/Apollo/blob/master/utils/chunk_reads.sh
However, we would like to eliminate this requirement and perform the idea of chunking internally. We will have an update regarding this and some other feature improvements soon. Thus, I will not close this issue for now to let you know about this update.
Thanks,
Can.
Hi Can,
thanks for pointing out the potential problem and providing the way to solve it. I am trying that.
Yes, I think it would be more convenient for users if apollo does the chunking by default (or at least issues a warning to continue if seeing long reads in one liner).
Thank you again!
Best, Hequan
Hi Hequan,
You can now use -c option to perform the chunking in runtime. Default chunking size is 1000, and it can be disabled by setting -c to 0. Chunking should reduce the memory requirements greatly without noticeably hurting the accuracy. I am closing the issue now but feel free to re-open it if you observe further issues related to high memory requirements.
Thanks, Can.
Hi,
thanks for developing tool. I am running it with PacBio-read polishing (40x, 240 Mb assembly).
It seems apollo's memory requirement is dynamic -- any reason behind this? At some time point, in my case, it required more than 256 Gb = the total amount I had on my node, thus having the node swapping all the time.
I think any improvement on reducing this high mem requirement or any option to control mem requirement would be good.
thanks, Hequan