AfshinLab / BLR

MIT License
5 stars 0 forks source link

Fix memory-issue in `buildmolecules.py` #11

Closed pontushojer closed 4 years ago

pontushojer commented 4 years ago

@marcelm noted that buildmolecules.py used quite a considerable amount of memory, this is PR trying to minimise memory usage for the script. I also did some factoring out parts of the build_molecules function and did other fixes and style changes.

The main improvements are:

This seams to have helped somewhat as you can see from the figures below (run on chr22). The peak memory use is now down about ~55%.

I have done several testruns on chr22 to confirm that the output is the same. The only difference is the order of the columns in the molecule_stats.tsv where the column "NrMolecules" has moved to the end.

image Figure 1: Profile for master, generated using psrecord.

image Figure 2: Profile for buildmol-memfix, generated using psrecord.

pontushojer commented 4 years ago

Thanks for your comments anyway @marcelm ! If there are no other objections I will merge this.

pontushojer commented 4 years ago

BTW I also did a testrun on the same dataset as @marcelm and this is the result. The gain here is a bit less, the max use is around 20 Gb.

I also want to point out that from how the code is written the memory load from each chromosome is more or less stacked. So splitting this over chromosomes would most likely distribute the load by the read count in each chromosome. For chromosome 1 this would in this dataset translate to below 2 Gb max load which is enough even when running on a single core on uppmax (about 3.6 Gb per core).

image

marcelm commented 4 years ago

No objections here :-), looking forward to trying this out when I’m done with the parallel branch.

marcelm commented 4 years ago

By the way, thanks for the pointer to psrecord, looks really useful!

pontushojer commented 4 years ago

By the way, thanks for the pointer to psrecord, looks really useful!

Yes, I found it while looking into this issue! It was really easy to use and did everything I wanted.