Add ability to NOT cache weight maps to reduce memory usage.

This originated in the Memory Usage and delete_local_fixed thread but is technically a different issue so I'm opening a new thread for it and closing the old one. Initial discussion is here: https://github.com/fermi-lat/Likelihood/issues/90#issuecomment-2352424048 but I'll repeat the relevant bit here:

From Jean:

However memory use is nearly as large as before. A few trials led me to the weights (wmap option). When I add the wmap option to a simple BinnedAnalysis call, memory use increases by 264 MB, which is much larger than the data cube size (1 MB), but similar to the original (HEALPix all-sky) weights file. I suspect that the entire weights file is kept in memory, although only its extract corresponding to the data cube is useful. It should be obvious from the code, but if you want to try the weights file is at /sdf/data/fermi/g/catalog/P8_P305/Weights_12years/DataE2XWgts_3percent_P305_zmax105_PSF3.fits I don't think this can explain all the memory use I see in the real-size calls, but it should be easy to fix.

My analysis:

Looking into this, here is what I've found.

Jean is correct. The code is caching the original weights map in memory. This is how the code is written and there are functions (on the WCSMapLibrary class) that the user of the code can call to delete the cached maps if they so desire. It was designed to be under user control, not automatic. Although I'm not 100% convince the deletion process works correctly as it is never actually used anywhere that I can tell.

As with everything, there is a performance vs. memory tradeoff. But that depends on what exactly you are doing. I explored this with both the Test-2.2.12 and test_memory tests that Jean had supplied for other issues before and the weights file he supplied. Here's what is happening:

When a weights map is used to create the likelihood object, the all-sky map is generated and added to the cache. On my computer this takes about 3 seconds and the in-memory size of that map is 252 MB. From this it generates the actual needed weights map that covers the ROI.

The smaller map is used to perform the calculation. 2a) For a simple test, like the Test-2.2.12 test, which is benchmarking on my computer at 3.5 seconds and 113 MB, the addition of the weights map brings that up to 6.8 seconds and 379 MB of memory used. basically just adding in the time and memory needed to load the full weights map and build the smaller one. 2b) For the more complex test_memory test, which builds 20 likelihood objects before performing the computation, it still takes about 3 seconds to build the larger weights map but because it is cached it is a one time event and each object just uses that cached map to build its smaller one. Not using the weights map, this test normally takes about 19 seconds and uses 910MB of memory. With the weights map, it takes 23 seconds and uses 1287 MB. This is simply the little bit of extra time and memory to create the large cached map and the individual smaller ones for the individual likelihood objects.

It's fairly straightforward to remove that cached large map in the code. 3a) For the smaller test, it still reaches that peak memory amount of 379 MB since at one point in the program the big map is in memory. But after the deletion, it drops back down (for the last 3 seconds of execution) back to the smaller memory foot print of a run without the weights maps. 3b) The larger test, however, takes a major performance hit. It saves a bit of memory, the peak being only about 1117 MB instead of 1287MB (but not all the way back down to 910MB) but the execution time is now up to 86 seconds instead of the original 19 or even the 23 with caching the map. This is because that large map has to be rebuilt for each likelihood object since it wasn't cached. This adds 3 seconds for each of the 20 likelihood objects created. This is having the program automatically delete the cached map after it is used. The next level up is leaving the deletion of the map as a user step. That could possibly save the extra 252 MB of space without the time penalty as the user would wait to delete it after all the likelihood objects are built. But it would be in memory all that time until it was deleted

I'm not sure it's worth making this change. While 252 MB is a lot compared to the smaller amount of memory the smaller analyses use, I don't think it is going to overwhelm and modern computer. Unless you are very memory constrained and trying to run a lot of these simultaneously. And on the larger problem its only about a 25% increase in memory usage. And the 1287 MB of memory usage is still a lot better than the 3138MB of memory used in this test by the old version of the program before we fixed the other issues.

Now, if there is a use case where the delta is more than the one-time hit of 252 MB, that would be a good test case to have to investigate this further. And I could look in to making the deletion code a bit more robust to allow the users to explicitly delete the maps, but I don't think an automatic deletion is the way to go.

I could be convinced otherwise, but I don't see a really compelling case for it yet as I think it introduces additional problems.

Jean's Response:

Thank you for checking that, Tom. The test harness is very simplified and meant to illustrate the original memory issue, so it is not representative of a real test case. In the real use case, I have 11 different weight maps, totalling 1.55 GB, for 19 components. Only 4 of them are used (3 times) for several components, so the readout time may be multiplied overall by a factor 2, but it will not induce a big performance hit. The fitting process and associated steps take typically 15 mn, so the readout never dominates anyway. As I said above, I don't think it is the only cause of my memory problems, but reducing that one will help. I have no objection, of course, to keeping the current functionality (even default), as long as it is optional, for example conditional on the same delete_local_fixed option of BinnedAnalysis, so that the caching mechanism is used only when delete_local_fixed=False.

fermi-lat / Likelihood

Add ability to NOT cache weight maps to reduce memory usage. #127