Open ivanlei opened 1 year ago
I've been experimenting with some configurations, mainly under vat 18, which is one of the one most affected by #6661. The default configuration slows quite quickly, and when it comes to no restart versus forced reload from snapshot, the old worker quickly slows to consistently about twice as slow as the reloaded one.
Profiling indicates a lot of the time is spent in GC, as execution quickly hits maximums, requiring allocator to make room often:
Increasing the initial amounts, and the incremental amounts in particular seemed to improve these contributors to the slowdown:
2x heap amounts and 16x slot amounts (both initial and incremental):
Initial tweaks were promising, and turns out with larger amounts the original worker starts as faster than reloaded ones. Afterwards I used the sampled profile to guide the tuning a bit. As suggested by @warner I looked to reduce time in GC relative to the sampled population. I also noticed how often requesting new slots of chunks resulted in GC, and thrashing in particular, which require allocating more memory. I only collected profiles of the original worker, the few I did of the reloaded ones showed similar, except smaller percentages GC-wise.
With 8x heap and 32x slot amounts:
Note: The spike in the middle was due to attaching a debugger and pausing to find a good entry point for the profiling, which throw off the timer as execution is paused. However the rest of the graph shows how timings return to more normal amounts and still show a smaller gap between original and reloaded workers, even if recording is short.
At this point we're spending less time in collection:
With 16x heap and 32x slots:
This configuration had an RSS of around 700M, so it may not be ideal, especially given the GC time improvement was marginal:
The last experiment was reduced chunk amounts, with larger slots, given the small improvement relative to memory usage.
6x heap and 32x slot amounts (mostly incremental, 4x initial chunk size, and 2x initial slot size):
This was the entire vat replay. Sampling around 12k deliveries:
Usage was around 400-500M.
What is the Problem Being Solved?
organic GC along allocation paths in vats has high CPU consumption. Increasing
incrementalHeapCount
inxsCreation
to reduce rate of GC, at cost of memory overheadDescription of the Design
once JS heap memory usage is flat (Hypothesis1), we may have a stable rate-of-fxAllocateSlot number to base tuning upon
Security Considerations
Scaling Considerations
We need to tune the configuration to reflect our baselines.
Test Plan