actuarialopensource / benchmarks

Some performance tests for actuarial applications
MIT License
14 stars 4 forks source link

Big-O notation, find a resolution #50

Open MatthewCaseres opened 11 months ago

MatthewCaseres commented 11 months ago

We have an approach for Big-O notation memory complexity in the case of our iterative Julia models - #34. I wanted a way of doing this for lifelib models and had an idea I thought would work, but it doesn't seem like we should do it - #49.

We want to classify or categorize different common approaches to actuarial modeling by this Big-O notation.

It is possible that we simplify the model while maintaining the same fundamental structure and then sort of just make an appeal to logic/math in the way that a computer scientist might. Then the argument for big-O notation isn't dependent on a specific implementation?

Anyhow, we don't have argument that the lifelib models should satisfy certain properties. Like Seriatim having different memory complexity than vectorized is sort of obvious in some sense, but saying it in the right way might be necessary to really have done the topic justice in my opinion.

serenity4 commented 11 months ago

Yeah, I don't know how computer scientists typically go about justifying claims of time or memory complexity. I think that typically, in the case of the iterative Julia implementations, it's quite obvious and can be typically inferred from how many for loops you have and over what. But when caching is involved (which is the case in many real-world domains outside of math-y algorithms) as is the case of memoization-based implementations, then things get more complicated. We could investigate how other computer scientists would approach the problem, e.g. maybe read research articles on implementation domains that face similar challenges.

serenity4 commented 11 months ago

I guess we could either try to think hard in terms of the logic, maybe simplify the model and come up with a theoretical result, or try to find ways to empirically estimate complexity.

Time complexity is generally quite easy, as it's just about benchmarking on different parameters, so it's more about how to derive memory complexity. I think that typical ways to do it is to look at the memory consumption of a given process, and it's done for languages without garbage collector; but maybe we can find e.g. studies for Python algorithms that have to deal with such estimations (or maybe Javascript?).