@cartermak reported that an expansion run for TT-8 is failing. Upon further investigation, we have found this:
Memory Leak:
We identified a memory leak issue where memory is not being released properly after an expansion run. Imagine this scenario:
Expansion A finishes and uses 1 GB of memory.
Expansion B starts, but instead of reclaiming the memory used by A, it allocates an additional 1 GB, bringing the total used memory to 2 GB.
This pattern continues with each subsequent expansion, causing a cumulative memory increase.
Eventually, the server runs out of memory and crashes.
High Memory Usage (RSS):
During testing with the problematic setup, we observed a significant spike in Resident Set Size (RSS) memory (~11GB). RSS represents the total amount of physical memory actively used by a process. This spike indicates inefficient memory utilization, even beyond the leak issue.
rss: '5602.07 MB -> Resident Set Size - total memory allocated for the process execution',
heapTotal: '2267.14 MB -> total size of the allocated heap',
heapUsed: '2243.09 MB -> actual memory used during the execution',
Solution:
We updated the aerie-user-ts-code-runner plugin with a two-line code change to fix a memory leak issue.
We’re also adding a new configuration option (“knob”) to the Docker Compose file for the sequencing server that allows you to set a maximum number of workers.
Originally, I thought the worker pool was capped at 8 workers. However, we discovered that it actually starts with 8 and scales up as the workload increases, causing a spike in memory usage (RSS). In tests, we saw up to 20 workers spawn, leading to a significant memory increase.
Verification
After implementing these two fixes, the Resident Set Size (RSS) memory usage has stabilized at around 3GB using a Clipper plan of 32 days, and the heap stayed in the 100mb range. This indicates that garbage collection is now functioning effectively, as memory usage drops when expansions are rerun.
Description
@cartermak reported that an expansion run for TT-8 is failing. Upon further investigation, we have found this:
We identified a memory leak issue where memory is not being released properly after an expansion run. Imagine this scenario:
Expansion A finishes and uses 1 GB of memory. Expansion B starts, but instead of reclaiming the memory used by A, it allocates an additional 1 GB, bringing the total used memory to 2 GB. This pattern continues with each subsequent expansion, causing a cumulative memory increase. Eventually, the server runs out of memory and crashes.
During testing with the problematic setup, we observed a significant spike in Resident Set Size (RSS) memory (~11GB). RSS represents the total amount of physical memory actively used by a process. This spike indicates inefficient memory utilization, even beyond the leak issue.
Solution:
We updated the aerie-user-ts-code-runner plugin with a two-line code change to fix a memory leak issue.
We’re also adding a new configuration option (“knob”) to the Docker Compose file for the sequencing server that allows you to set a maximum number of workers.
Originally, I thought the worker pool was capped at 8 workers. However, we discovered that it actually starts with 8 and scales up as the workload increases, causing a spike in memory usage (RSS). In tests, we saw up to 20 workers spawn, leading to a significant memory increase.
Verification
After implementing these two fixes, the Resident Set Size (RSS) memory usage has stabilized at around 3GB using a
Clipper
plan of 32 days, and theheap
stayed in the 100mb range. This indicates that garbage collection is now functioning effectively, as memory usage drops when expansions are rerun.