Demonstrate improved (O(P) not O(P*T)) memory consumption in write up on cash value model

actuarialopensource / benchmarks

Some performance tests for actuarial applications

MIT License

14 stars 4 forks source link

Demonstrate improved (O(P) not O(P*T)) memory consumption in write up on cash value model #34

Closed MatthewCaseres closed 1 year ago

MatthewCaseres commented 1 year ago

This would live in the README file related to the universal life. Include any graphs figures or tables from thorough experimentation in this readme, noting device specs. In-depth benchmarks for higher performance unlikely to work in GitHub actions.

Experiments for large models (any generated figures) should be easily reproducible

serenity4 commented 1 year ago

In theory, it is hard to demonstrate this for the Julia implementation, as the garbage collector is not eagerly freeing memory and will therefore throw off any memory measurements. See this discourse thread for more details.

But I think we can reasonably use the total memory usage as reported by @time (or @allocated) and expect it to grow the same as the theoretical memory complexity, so even if the absolute peak memory consumption is impossible to estimate we can get the scaling correct. Experiments show that it works well, I'll provide a few plots for the demonstration.

MatthewCaseres commented 1 year ago

Great, one concrete measurement is how many policies until the computer maxes out, unsure if that metric is useful for spatial complexity, but it is one of practical concern.

serenity4 commented 1 year ago

This would live in the README file related to the universal life.

Since we already have a notebook containing benchmarks at https://github.com/actuarialopensource/benchmarks/blob/main/Julia/notebooks/benchmarks.ipynb, should we perhaps add the memory complexity benchmarks there?

serenity4 commented 1 year ago

one concrete measurement is how many policies until the computer maxes out, unsure if that metric is useful for spatial complexity, but it is one of practical concern.

I don't think it's useful to have a particular number, since that depends on available RAM, which may easily range between 8 GB and 64 or even 128 GB. Instead we can investigate how long that would be to say simulate 10 million model points. With RAM you always have the argument of "if you don't have enough, get more", but not with performance (unless the implementation is heavily parallelized - then throw in more cores/nodes -, which it is not here).

MatthewCaseres commented 1 year ago

https://github.com/actuarialopensource/benchmarks/blob/main/Julia/notebooks/benchmarks.ipynb

Yep that works. Interesting that it seems like we have this graph for Julia but not having some equivalent for Python? In which case we are relying on some theoretical argument for Python's memory complexity and not having any experiment to back it up?

Instead we can investigate how long that would be to say simulate 10 million model points. With RAM you always have the argument of "if you don't have enough, get more", but not with performance (unless the implementation is heavily parallelized - then throw in more cores/nodes -, which it is not here).

I think realistically people are running these models on whatever PC is given to them by work or something like that, so constrained hardware is a realistic scenario. I was throwing this out there as an example of a concrete effect of the implementation strategy.

serenity4 commented 1 year ago

Interesting that it seems like we have this graph for Julia but not having some equivalent for Python? In which case we are relying on some theoretical argument for Python's memory complexity and not having any experiment to back it up?

Do you mean in context of #43? Yes, I only added arguments for the Julia implementation. In general, memory complexity is hard to correctly evaluate in GC languages as mentioned above (and even harder for interpreted languages), and if we have OK tools to have a not-too-bad estimate for Julia, I have no idea if we can find the same for Python. We could monitor the memory used by the process, take the maximum and use that (see https://github.com/pythonprofilers/memory_profiler for example), and hope that it matches the actual memory complexity. On the Julia side, as I was able to reduce temporary memory allocations to a minimum on this fairly simple implementation, it was more or less guaranteed that we get coherent results across runs.

MatthewCaseres commented 1 year ago

That software for Python memory analysis isn't maintained and I'd rather not have to rely on hope. If you see a promising approach to generate some figures go for it, but it seems pretty clear that the memory complexity is O(P) based on reading the code and the allocations analysis.