cositools / cosipy

The COSI high-level data analysis tools
Apache License 2.0
3 stars 16 forks source link

Need to optimize combine_unbinned_data method #119

Open ckarwin opened 7 months ago

ckarwin commented 7 months ago

This method is using more memory than needed. Needs to be optimized.

ckarwin commented 1 month ago

@krishnatejavedula I wanted to try to provide a bit of guidance on this issue.

You'll see that the method reads in one dictionary at a time, and concatenates them to the final combined dictionary. In order to concatenate a given dictionary, it needs to be read into memory. However, it should be immediately removed from memory after being concatenated. I'm not sure that this was actually happening, b/c when we were testing it we saw that the RAM was multiplying with the number of combined files (beyond what we expect for just the new combined dictionary). So that's one thing to check. Although this may have already been resolved since opening the issue -- I'm not quite sure.

At most, I think the RAM will need to hold the data for all combined dictionaries, plus the current dictionary that is being concatenated. One possible option to reduce this might be to try a memory map: https://numpy.org/doc/stable/reference/generated/numpy.memmap.html.

krishnatejavedula commented 1 month ago

@ckarwin Thanks for the guidance. It does seem like the RAM usage is higher than expected and I have encountered this issue while working with Data IO during DC2. I’ll check on this and see if using NumPy's memmap can help reduce memory consumption.

I appreciate your input and will keep you updated.