calculating memory requirements based on bam file size - Githubissues

biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER

http://bioepic.readthedocs.io

MIT License

31 stars 6 forks source link

calculating memory requirements based on bam file size #86

Open avilella opened 5 years ago

avilella commented 5 years ago

Is it possible to approximately calculate the memory requirements of EPIC based on the size or number of reads of the bam files given as input? This is in line with the efforts to try to give the smallest possible instance to the job that will not run out of memory.

endrebak commented 5 years ago

No, I have not thought much about memory usage, just done the obvious things to not make it a memory-hog. But epic is memory-intensive, which is what allows it to create all those nice bigwigs/matrixes in the end.

endrebak commented 5 years ago

Having two scripts, one to produce the enriched regions and another one (using the original input files and the enriched regions) to produce bigwigs and matrixes would make the first much more memory efficient and even faster. But as I do not see a paper coming out of epic, I do not have the resources to prioritize it :/

avilella commented 5 years ago

I see. Do you mean that the two script option would work now or that it would require some code to get working?

On Tue, Aug 28, 2018 at 9:45 AM Endre Bakken Stovner < notifications@github.com> wrote:

Having two scripts, one to produce the enriched regions and another one (using the original input files and the enriched regions) to produce bigwigs and matrixes would make the first much more memory efficient and even faster. But as I do not see a paper coming out of epic, I do not have the resources to prioritize it :/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/86#issuecomment-416501878, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJpNzJVn9L7ASkEXf1xyk08vogh46U9ks5uVQM4gaJpZM4WI_Hz .

endrebak commented 5 years ago

It would require some modifications to the current code.

endrebak commented 5 years ago

Sorry for being unclear. It was almost halfway a note to self. I could create an epic-light that does not preserve any bin-info other than that required to find the enriched regions. But I do not see that coming out anytime soon due to resource constraints.

endrebak commented 5 years ago

I have realized that in order to get epic accepted as an application note I should reduce the memory requirements greatly. This is not possible for the bigwig/matrix-producing parts, but should be possible when just calling islands. My first priority is getting pyranges out there though, since I see it as a foundational library for genomics/bioinformatics, while epic is just a piece of software for a very specific task. It is in my backlog though.

Unfortunately I cannot guarantee that it will ever happen :/

endrebak commented 5 years ago

@avilella See SICER2 at https://github.com/endrebak/SICER2

memory_sicer2_vs_sicer_no_bigwig speed_sicer2_vs_sicer_no_bigwig