Open avilella opened 5 years ago
No, I have not thought much about memory usage, just done the obvious things to not make it a memory-hog. But epic is memory-intensive, which is what allows it to create all those nice bigwigs/matrixes in the end.
Having two scripts, one to produce the enriched regions and another one (using the original input files and the enriched regions) to produce bigwigs and matrixes would make the first much more memory efficient and even faster. But as I do not see a paper coming out of epic, I do not have the resources to prioritize it :/
I see. Do you mean that the two script option would work now or that it would require some code to get working?
On Tue, Aug 28, 2018 at 9:45 AM Endre Bakken Stovner < notifications@github.com> wrote:
Having two scripts, one to produce the enriched regions and another one (using the original input files and the enriched regions) to produce bigwigs and matrixes would make the first much more memory efficient and even faster. But as I do not see a paper coming out of epic, I do not have the resources to prioritize it :/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/86#issuecomment-416501878, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJpNzJVn9L7ASkEXf1xyk08vogh46U9ks5uVQM4gaJpZM4WI_Hz .
It would require some modifications to the current code.
Sorry for being unclear. It was almost halfway a note to self. I could create an epic-light that does not preserve any bin-info other than that required to find the enriched regions. But I do not see that coming out anytime soon due to resource constraints.
I have realized that in order to get epic accepted as an application note I should reduce the memory requirements greatly. This is not possible for the bigwig/matrix-producing parts, but should be possible when just calling islands. My first priority is getting pyranges out there though, since I see it as a foundational library for genomics/bioinformatics, while epic is just a piece of software for a very specific task. It is in my backlog though.
Unfortunately I cannot guarantee that it will ever happen :/
@avilella See SICER2 at https://github.com/endrebak/SICER2
Is it possible to approximately calculate the memory requirements of EPIC based on the size or number of reads of the bam files given as input? This is in line with the efforts to try to give the smallest possible instance to the job that will not run out of memory.