Open parkerabercrombie opened 2 weeks ago
Aerie version 2.11.2 did not utilize jemalloc garbage collection, instead relying on Node.js' default memory allocator. This was addressed in Aerie version 2.14, where jemalloc was added to the sequencing server Dockerfile. Here is the PR where this change was added.
https://github.com/NASA-AMMOS/aerie/pull/1487
Any version of Aerie after 2.14 will have this update and fix the memory leak problem. I verified with the Clipper's Data that @parkerabercrombie sent to the Aerie team. After running 32 expansion runs the memory held and was cleanup at 4 gb without any server crashes.
FWI
For an optimal experience with sequence expansion, we recommend upgrading your Virtual Machine (VM) configuration to include:
A more powerful CPU Increased RAM This will help ensure smoother performance and faster processing times.
Keep in mind that there is a specific bottleneck on the cpu. From our logs, it appears that you'll need to wait approximately 13 minutes after a server restart, before expanding your plan into sequences, as the server needs time to transpile the Expansion logic files. For reference, on a Mac M1 or above, this processing time can be reduced to around 2 minutes.
Thanks @goetzrrGit. We'll try increasing the resources on that server.
Checked for duplicates
No - I haven't checked
Is this a regression?
No - This is a new bug
Version
2.11.2
Describe the bug
The Aerie sequencing service is crashing when attempting to expand one of our plans. We attempted to expand the plan, observed that the request seems to be hung and the server memory pegged at 93%. We restarted the service and re-submitted the expansion request. Again the request seemed to hang and the service seemed to crash and restart itself. On the third attempt the expansion succeeded.
Reproduction
Has occurred on 2/3 attempts to expand our cruise002 plan.
Logs
Full log Aerie Sequencing Logs-data-2024-11-12 15_31_27.csv
Severity
Critical