fabric8-services / keycloak

Open Source Identity and Access Management For Modern Applications and Services
http://www.keycloak.org
Other
1 stars 4 forks source link

java.lang.OutOfMemoryError: Java heap space #61

Open alexeykazakov opened 7 years ago

alexeykazakov commented 7 years ago

Today our sso.prod-preview died bc of: 16:00:36,097 ERROR [io.undertow.request] (default task-53) UT005023: Exception handling request to /auth/realms/fabric8/.well-known/openid-configuration: java.lang.OutOfMemoryError: Java heap space The deployment was using 1GB of RAM. As far as I remember the limit is 2GB.

So, we need to check JVM settings in our Wildfly/KC and tune them accordingly. cc: @hectorj2f @bartoszmajsak

alexeykazakov commented 7 years ago

Settings used by default: JAVA_OPTS: -server -Xms64m -Xmx512m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true

alexeykazakov commented 7 years ago

I tuned it via setting JAVA_OPTS env var in DC for now: -server -Xms256m -Xmx2048m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=512m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true

bartoszmajsak commented 7 years ago

That should postpone the error. I would suggest to do heap dumps for further analysis. There is an option to do a heap dump when such error occur (– XX:+HeapDumpOnOutOfMemoryError), but also on demand using jvm tools such as jmap. Then we can use VisualVM for example to analyze further.

I was also thinking during the flight that when doing performance tests using vegeta we are quite blind. We should attach some sort of profiler to really understand what's the bottleneck. I would also suggest to talk to KC guys about our findings.

bartoszmajsak commented 7 years ago

Third option is to get the dump using jmx console available in Wildflower, but that is probably the last resort as our images will have to have admin user configured to get there.

hectorj2f commented 7 years ago

I was also thinking during the flight that when doing performance tests using vegeta we are quite blind. We should attach some sort of profiler to really understand what's the bottleneck. I would also suggest to talk to KC guys about our findings.

We're collecting data since this Thursday from pmcd about the performance of the containers in our test cluster. I think the bottleneck is more CPU boundaries rather than memory. But yeah, let's test it with this configuration and ping the KC guys

hectorj2f commented 7 years ago

For your entertainment https://osd-monitor-keycloak-cluster-test.b6ff.rh-idev.openshiftapps.com/ . You can ask me for the credentials

alexeykazakov commented 7 years ago

Reopening bc we need to investigate it further and tune accordingly.

bartoszmajsak commented 7 years ago

We're collecting data since this Thursday from pmcd about the performance of the containers in our test cluster. I think the bottleneck is more CPU boundaries rather than memory.

Pcp gives you hardware boundaries monitoring, but we still have no idea what is really consuming that much memory from them jvm perspective. That's why heap dump would help here. Worst case there is some memory leaking somewhere. Heap dumps would be definitely helping here.

We should take care of storing them in the pv instead of container itself.

hectorj2f commented 7 years ago

We might also think about using GCTimeRatio to release memory. If our pods don't behave in terms of memory usage, they could be killed by Openshift (due to resource limits)

GCTimeRatio The ratio flags complement the standard -Xms and -Xmx flags to enable
footprint management for the relevant GC. Without them the GC will
create an initial heap based on the -Xms figure and add heap up as
objects are allocated up to the -Xms maximum but will never release any
of the vmem pages mapped into the heap range. By default the GCs are
space-greedy i.e. if an app creates lots of objects and runs out of heap
space the GC will much prefer mapping in more space over running more
frequent GCs. So, most apps -- even those with relatively small working
sets -- creep (or sometimes leap) up to the -Xmx limit and then stay
there. Footprint management adjusts that tradeoff by i) shifting the
balance towards running GCs more often and (i.e. compacting the data
into less space) and ii) releasing heap when the mapped heap is a lot
bigger than the working set.