Open frankjkelly opened 3 years ago
Update with a little more data. Let the app run for a few hours then looked at the Histogram
The object with the largest heap was this
Class Name | Objects | Shallow Heap
-------------------------------------------------
java.util.HashMap$Node[]| 10,446 | 107,051,112
-------------------------------------------------
And then when using MAT I do path2gc I get
Class Name | Shallow Heap | Retained Heap
-------------------------------------------------------------------------------------------------------------------------------------------------------------
java.util.HashMap$Node[32768] @ 0xeaf6ed48 | 131,088 | 131,088
'- table java.util.LinkedHashMap @ 0xea4b8288 | 56 | 131,232
'- cacheMap sun.security.util.MemoryCache @ 0xea1f6820 | 32 | 131,312
'- sessionCache sun.security.ssl.SSLSessionContextImpl @ 0xea022228 | 32 | 262,800
'- clientCache sun.security.ssl.SSLContextImpl$TLSContext @ 0xe9cf49d0 | 48 | 704
'- sslContext sun.security.ssl.SSLSocketImpl @ 0xe9b5d168 | 64 | 270,328
'- connection org.postgresql.core.PGStream @ 0xe9afacf0 | 88 | 35,456
'- pgStream org.postgresql.core.v3.QueryExecutorImpl @ 0xe9ad2720 | 216 | 40,984
'- queryExecutor org.postgresql.jdbc.PgConnection @ 0xe9ac6850 | 128 | 62,304
'- connection com.zaxxer.hikari.pool.PoolEntry @ 0xe9ac1300 | 56 | 160
|- arg$2 com.zaxxer.hikari.pool.HikariPool$$Lambda$744 @ 0xe9ac45a0 | 24 | 24
| '- task java.util.concurrent.Executors$RunnableAdapter @ 0xe9ac0010 | 24 | 48
| '- callable java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask @ 0xe9ab4200| 72 | 120
| '- [94] java.util.concurrent.RunnableScheduledFuture[271] @ 0xe1f62120 | 1,104 | 1,104
| '- queue java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue @ 0xe1318f78| 32 | 1,136
| |- <Java Local> java.lang.Thread @ 0xe1318de8 HikariPool-1 housekeeper Thread | 120 | 592
| |- workQueue java.util.concurrent.ScheduledThreadPoolExecutor @ 0xe1318fa8 | 80 | 384
-------------------------------------------------------------------------------------------------------------------------------------------------------------
And there are over 10k of these at ~128kB each
Turns out that we were running it in Kubernetes with 1 vcpu and 1 GB of RAM assigned to the container and Java (in all it's wisdom) decided to use the SerialGC
collector.
Once we forced it to use G1GC
collector this problem went away.
I would have presumed that SerialGC
would have operated mostly the same and certainly that it would be result in an OOMKilled
error even if was a sub-optimal collector.
Anyway feel free to leave this open for investigation if of interest to debug further or close.
Thanks!
Appreciate the follow-up on how you fixed it! I'm quite impressed that changing the GC actually made a difference, maybe Java 11 improvements - if only it picked the best option itself, as you said...
Nevertheless, it should be noted 1 VCPU and 1GB RAM is likely not enough for 200 DB connections and the kind of work level that might need 200 connections! The docs recommend 3 connections for a typical single CPU server. You might significantly gain performance by reducing the number of connections.
https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing#the-formula
We have 750MB of RAM allocated to our Spring App with a pool of 200 Connections against a PostgreSQL DB (10.7)
Java Runtime: OpenJDK Runtime Environment 11.0.6+10-post-Debian-1bpo91 Deployment: CentOS Linux
Even without any traffic the heap memory increases and increases and eventually dies running out of memory.
After taking a Heap Dump and running it Eclipse Memory Analyzer it implicates Hikari (maybe directly or indirectly?) Saying there is 1 problem suspect consuming > 100MB
When I click through to the Threads it shows
plus also
Hope that helps. . . it could possibly be the "leak" is downstream e.g. postgresql driver or Java JRE but I'd appreciate any help. Thanks!