Closed mfitz closed 8 months ago
Cool. I find it a bit funny, since tablesaw tables is what I started with for the Link Log 😅 I will make sure to check how my additional KPIs impact the memory profile 😄
Yeah, the key is to hide the fact that we're using Tablesaw - keep it out of the API.
Summary of Changes
Summary of Results
Memory usage has been significantly reduced, down to about 20% of the previous level in the case of the TE model. Processing times are also shorter thanks to no longer parsing the plans/population files. The log output of
MemoryObserver
from each run is included in full below, but can be summarised thus:used RAM
total
main
-Xmx28g
; with a bigger value, these numbers would be bigger tooreduce_memory_usage
main
reduce_memory_usage
Before Fixes
Using Paris East Baseline Locally
The process died before finishing populating the (Tablesaw) Link Log:
Heap dump analysis holds no surprises and shows the packages and classes using the most memory to be:
java.util.TreeMap$Entry
: c. 15GB, very likely to be the Guava Table holding the Link Logjava.util.TreeMap
: c. 4.5GBIf I give the JVM 28GB of heap space, the process finishes normally but skates close to the limit with a high watermark of 27.8 GBs:
Using TE Model in AWS Batch
Ran to completion using a high-water mark of around 165GB, with 200GB reserved
After Fixes
Using Paris East Baseline Locally
Ran to completion using a high-water mark of around
12.5GB
, with17.2GB
reserved (previously27.8GB
and28.7GB
respectively with a max heap value of 28GBs)Using TE Model in AWS Batch
Ran to completion using a high-water mark of around
35GB
, with49GB
reserved (previously165GB
and200GB
respectively)