Open plajjan opened 10 months ago
@nordlander @sydow I updated this a bit based on my most recent thoughts. Feel free to read & comment.
I suppose I could also mention that the sort of reason why this popped up now is that we recently got the PRW exporter in Telemetrify and I wrote a small program to just generate mock data and export that to a time series database. It's doing like 20% of the total amount of work that Telemetrify would normally do, so I know Telemetrify will run ever slower. My example program is more like the upper end of the performance but it doesn't run nearly as fast as I want. I guess my expectations were quite low already even though my hopes were higher and long term I need to be muuuuch faster. I haven't fully benchmarked things here but I believe the GC, as always, is the main source of poor performance and so I began thinking of how to improve it.. and voila, this issue :)
We should add escape / lifetime analysis so that we can place each variable in one of multiple categories of lifetime scopes which in turn allows us to allocate this where it makes the most sense.
The idea is that based on such lifetimes we could place allocations in a memory arena. I suppose it's possible to implement arenas in different ways but I'm thinking something like:
Some lifetimes and how we could allocate objects:
Analyze the scope of variables to find variables that we can place in lifetime limited arenas. The simplest example is that of a variable that is temporary and only used within the scope of one function. The next step is moving return values to be placed on the stack/arena of the outer / calling function.
What triggered me to look into this is the bad performance of the GC and trying to lessen the load on the GC-heap but using arenas is no temporary hack. Even with a better GC, I think we want multiple approaches to memory management, and doing static lifetime analysis is probably the best foundation for that either way. The way we need to do GC for inter-actor messages across a distributed system fundamentally has different requirements than a GC for a shorter lifetime. Like even if we say there are problems with arenas so that we want a GC for say the continuation lifetime, we could then have a GC that works differently than the inter-actor GC, doing less things (because no thread synchronization required!) and thus be faster for that type of work.
I don't see a world in where lifetime analysis would be bad, it is always a good foundation!
I think it's also really encouraging that we can implement lifetime analysis and then based on knowing some of the shorter lifetimes, we can implement arenas and take on this work in smaller pieces. Our current GC remains the bigger hindrance to performance, so anything we do to lessen the load on it, is a very good thing!