eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.23k stars 710 forks source link

JITServer stops compiling after reaching memory limit #11154

Open ninja- opened 3 years ago

ninja- commented 3 years ago

It seems like in current version of JITServer, when it reaches memory limit, it just says "out of scratch space" and stops compiling anything until restart. Shouldn't it remove old cached code in that case?

fjeremic commented 3 years ago

@mpirvu FYI.

mpirvu commented 3 years ago

@ninja- I'll try to reproduce it on my end. Could you please let me know the version of the build and the command line options that you've used (for both client and server), if any? Also, if you limited the amount of memory at client/server, let me know those limits as well. Thanks

ninja- commented 3 years ago

@mpirvu

No server options. Server is running in container with 2GB limit and it reaches 1-2GB size within 2 hours of usage.

Client options: XShareClasses XTuneVirtualized JitServerHost = ... UseJitServer JitScratchSpaceLimit: 16mb (probably ignored) XCodeCacheTotal: 16mb (can't spare much more per-JVM at the moment )

openjdk version "15.0.1" 2020-10-20
OpenJDK Runtime Environment AdoptOpenJDK (build 15.0.1+9)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.23.0, JRE 15 Linux amd64-64-Bit Compressed References 20201022_81 (JIT enabled, AOT enabled)
OpenJ9   - 0394ef754
OMR      - 582366ae5
JCL      - ad583de3b5 based on jdk-15.0.1+9)
mpirvu commented 3 years ago

I sort-of reproduced this issue by using 12 client JVMs that attach more or less simultaneously to the server and the server limited to 1GB. Since each client JVM launches several compilation threads, the server will use quite a few compilation threads in parallel. The logs show 62 compilation threads active at some point (maximum value is 63). Given that the scratch memory needs of these compilation threads overlap, at some point the 1GB memory limit is exhausted and the server starts failing compilations in order to avoid an OOM scenario leading to a crash. If the amount of free memory is very low, the server will suspend compilation threads forever. In my runs I see that some compilation threads are suspended, but not all of them, so the server continues to work, just with fewer compilation threads.

There are two contributors to memory usage in the server:

  1. Scratch memory used by compilation threads during a compilation. This memory is fully released to the OS when the compilation finishes, but if there are many compilation threads in parallel, their memory needs add up. Each compilation thread starts with a 16MB segment and grows (in 16 MB increments) up to 512MB if needed. One way to circumvent this problem is to start JVM clients in a staggered fashion (with some delay in-between). I am thinking that we should amend the behavior of the JITServer to not use that many compilation threads if the available memory is low. The downside is that compilation requests will wait in the compilation queue and their processing is going to be delayed.
  2. Internal caches. In order to minimize the number of messages exchanged between client and server, the server caches runtime information from the client. Many clients using to the server means a potentially large amount of memory devoted to these caches. The cache for a client is purged during the shutdown sequence of a client JVM (the client sends a message to the server informing that it's going away soon). This can happen only if the JVM is shutdown nicely and is not terminated forcefully, so one way to avoid memory bloat due to caches is to refrain from killing the JVMs. The cache for a client is also purged if the client hasn't requested a compilation for more than 1000 minutes. I am thinking that we should change this policy and be much more aggressive in purging when available memory is running low.
ninja- commented 3 years ago

Thanks for the info. It seems like I should be fine after increasing JITServer memory or number of instances in such case.

mpirvu commented 3 years ago

@ninja- Yes, increasing the number of JITServer instances should improve the situation. If you don't mind, could you please give us more details about your setup? We want to make the JITServer technology better, but for that we also need to understand how people are actually using it. I would be interested to know:

  1. How many JVM clients are connected to one JITServer?
  2. Are the JVMs started all at the same time or do you use a staggered approach?
  3. How long does a client JVM last (some people use ephemeral instances, others, long running instances)?
  4. What are the CPU and memory limits for the client JVMs?
  5. Do you have swap enabled on your machines?
  6. Do you use Kubernetes (which should do automatic scheduling on worker nodes)? Thanks
ninja- commented 3 years ago

@mpirvu

  1. at the moment around ~ 100 jvms and 2 jitserver instances. once I increased jitserver memory from 1GB to 2GB is fine and it usually sits at around ~ 500MB usage.
  2. sometimes a couple of them would start at the same time depending on traffic. or all of them if we do a quick nightly update rollout :)
  3. 1 day maximum I think, some instances are downscaling earlier depending on traffic. but I am aiming to restart them even more often in future. it doesn't matter to me if they take a bit longer to start under big pressure
  4. I think an average heap would be 256mb and cpu limit of 1.5 core
  5. yes but swap is far from being used
  6. yes. using k8s service for JITServer traffic with sticky sessions. otherwise as traffic from same JVM was hitting random instances things were super slow.
mpirvu commented 3 years ago

Thanks for the answers.

100 jvms and 2 jitserver instances

If all those JVMs start at the same time, the two jitserver instances are unlikely to properly handle the compilation needs (each JVM may use up to 7 compilation threads, so you may need hundreds of threads on the servers side)

cpu limit of 1.5 core

From what we've seen, JITServer is helpful in constrained environments 0.5-2 vCPUs. Your situations seems to fall in this range. With 4+ vCPUs there is enough computational power at the JVM to handle compilations itself and avoid the network latency.

traffic from same JVM was hitting random instances things were super slow.

All compilation threads from a particular JVM should go to the same server. Multiple client-JVMs going to the same server is perfectly fine. Single client-JVM being served in parallel by several servers is not, for technical reasons related to the caches the server keeps around. We should probably make that clear in the documentation.

ninja- commented 3 years ago

If all those JVMs start at the same time, the two jitserver instances are unlikely to properly handle the compilation needs (each JVM may use up to 7 compilation threads, so you may need hundreds of threads on the servers side)

I could consider reducing compilation threads on client.

From what we've seen, JITServer is helpful in constrained environments 0.5-2 vCPUs. Your situations seems to fall in this range. With 4+ vCPUs there is enough computational power at the JVM to handle compilations itself and avoid the network latency.

Well, it's not like it's a reserved 1.5 core. It's hugely overallocated(limit > request). Out workloads just have random need for CPU spikes so it's hard to do it another way without causing lags. I am already seeing good effects from using JITServer so far, and there's no need for JIT scratch space per JVM, which is quite a nice saving.

All compilation threads from a particular JVM should go to the same server. Multiple client-JVMs going to the same server is perfectly fine. Single client-JVM being served in parallel by several servers is not, for technical reasons related to the caches the server keeps around. We should probably make that clear in the documentation.

Sure I understand. Works fine now with sticky sessions :)

ninja- commented 3 years ago

Is the default scratchspace/cache 1GB? because it seems like my jitserver crashes/restarts at that point every time. that seems to be seperate value than "out of scratch space" because at that point I've never seen it auto restart - and now it keeps restarting after reaching 1GB

ninja- commented 3 years ago

if I am correct jitserver restarting might be at least one cause of random lags on prod - seems to correlate. is there any chance to handle such restarts better so it wouldn't lag the client process while exchanging noncached data with jitserver?

mpirvu commented 3 years ago

Is the default scratchspace/cache 1GB?

No. In OpenJ9 each compilation thread has a scratch space limit of 256 MiB and the server doubles that. There is no hard stop at 1 GiB, unless, somehow, kubernetes still imposes a 1 GiB limit for the server container.

is there any chance to handle such restarts better so it wouldn't lag the client process while exchanging noncached data with jitserver?

If I understand correctly what you are saying, the client connects to the server and at some point the server dies, it is restarted, but the new server does not have its caches populated, so there is a lot of communication between client and the server. This network communication will increase the latency for compilations, delaying the switch to optimized code at the client. Before proposing a solution I would like to make sure that is indeed the case. I would really appreciate some verbose logs obtained with -Xjit:verbose={compilePerformance},vlog=VLOG_PATH both at the client and at the server, so that I can provide better advice. Maybe, compiling at higher optimization levels (hot and scorching) is what is causing the memory surge and we can turn that off.

mpirvu commented 3 years ago

@ninja- I wanted to follow with you on this issue on memory consumption. OpenJ9 release 0.24 is going to GA soon (Jan 25 according to plan) and we've added a few changes to avoid the scenario you describe here: PR #11189 #11323 and #11364. We are working on further improvements in PR #11628 and Issue #11162. Meanwhile I would like to reiterate my interest for some verbose logs that would help us better understand the usage pattern of JITServer in your k8s environment. Thank you

ninja- commented 3 years ago

I appreciate the fixes being made but I had to give up on this feature and remove it from production :( while I haven't tested with compilation threads = 1 yet, it was just too unstable. jit server would crash very often even though it had enough memory. with a backtrace I can't decode because the native symbols fix was still not merged... and every time jit server crashes it created huge lags for the JVMs. the setup is really hard to test without big production-like traffic.

Also, I wasn't able to isolate the source of lags while recording with perf during the actual lag, even though most of the symbols were decoded I think. (I understand JVM verbose logs may have the answer but I haven't tried recording them)

I think what's important here is create a production-grade way to route traffic to multiple JIT servers. I used a K8S service with 24h sticky sessions based on pod IP, but that's far from perfect because after sticky sessions expire it changes jit server = lag and also sticky sessions could mean poor load balancing as pods are created and removed...

also, the Docker image that hosts JITServer for example for K8S setup needs wrappers, for example to cat the crash logs otherwise they will be just gone and the pod will restart. That's another thing I started working on before I kind of gave up on using JIT server.