Since the SyncDetails performance might get reduced with the below quick change suggested, for organisations with more addresses, and hence good to pickup this earlier.
Issue
Avni server becomes nonresponsive after 4-5 days due to repeated Full-GC using up almost all the CPU time, trying to bring down heap memory from near the limit of 5gb, in production env.
Current issue
This results in us having to periodically encounter non-responsive app and later a forced restart of the server, which seems like a broken behaviour.
Quick change to reduce impact
On reducing the 'avni.cache.max.weight' to 1000, we might be able to delay the issue for a much longer period of time, but the SyncDetails performance might be reduced.
Analysis details
On heap dump analysis, we found that a huge chunk of heap memory is held by VirtualCachmentProjection Proxy class objects, which account for nearly 50% of the Heap memory.
Currently, we had specified a 'avni.cache.max.weight' of 3000, which currently results in total number of 670K records, instead of the expected 300K records.
AC
Reduce AddressLevelCache footprint in Avni-server, so that overall app memory footprint remains within configured limits.
There are following avenues for improvement that we could check out:
Ensure that the count of entries are staying within bounds in method 'getConcurrentMapCacheWithWeightedCapacityForAddressesConfig()', validate with Unit tests
Play around with limits, to figure optimal value for Prod Server config
Add Cache size, Miss, Hit Stats logging to monitor the AddressLevelCache effectiveness
Ensure that CachedObjects and their Proxy classes are also cleaned up during GC
Need:
Since the SyncDetails performance might get reduced with the below quick change suggested, for organisations with more addresses, and hence good to pickup this earlier.
Issue
Avni server becomes nonresponsive after 4-5 days due to repeated Full-GC using up almost all the CPU time, trying to bring down heap memory from near the limit of 5gb, in production env.
Current issue
This results in us having to periodically encounter non-responsive app and later a forced restart of the server, which seems like a broken behaviour.
Quick change to reduce impact
Analysis details
AC
Reduce AddressLevelCache footprint in Avni-server, so that overall app memory footprint remains within configured limits.
There are following avenues for improvement that we could check out: