Closed ushaazul closed 2 years ago
Excellent! Thanks for the submission. At first glance, it looks good. Two points of feedback.
We cannot accept unless the CLA is signed. You'll need to click the link where it says "Please click here to be authorized" & follow the instructions. It should update when you've got it all correct.
Can you provide some validation instructions? Presumably, I'll need to swap in this JRE in config/components.yml
, but is there anything else? Are there public downloads of the JRE available? Or do I need to host those on my own network?
Thanks
Hi Daniel,
I totally understand about the CLA. That takes time with companies. When it goes green, I will review & we'll get this merged. Thanks again for the contribution!
Hi Daniel, Our legal team submitted the clearance. I also made some changes after the first pull request. Hope you are able to see both the changes and will accept the merge request
Thanks
Can we get an ETA on when this change will be merged?
Thanks
It may be a couple of weeks still. There are a number of issues we are trying to address in the next release. I am going to try and squeeze this into that release, but my plan is to review & merge this after the other items are done.
Thank you so much. Really appreciate the effort.
OK, so I got some time to test this. Some notes.
The public downloads are present at https://cdn.azul.com/zing/bin for cloudfoundry
I don't think this matters though. If I'm understanding correctly, the intent is to have this JDK support disabled by default since this software requires licensing. That means users are to download the software directly from your site, host a local buildpack repo with the software and then use that. Is that correct?
-----> Downloading Zing JRE 1.8.0_332 from http://localhost:8000/zing22.06.0.0-3-jdk8.0.332-linux_x64.tar.gz (found in cache) Expanding Zing JRE to .java-buildpack/zing_jre (4.5s)
The basics are there and seem to be working.
https://github.com/cloudfoundry/java-buildpack/blob/main/docs/jre-open_jdk_jre.md#jvmkill
I don't think this is working correctly with the Zing JRE. It detects the OOME when it occurs, but it hangs in the process of trying to dump some final memory statistics about the process (useful for debugging). It seems to be hanging somewhere around here based on the logging output at the time it hangs.
I'm not sure there's anything you can do with the JVM kill agent, but I don't know off hand why it's failing either, so it's hard to say. What I would probably suggest though is to just not include the JVM kill agent. That is happening automatically because you are basing your ZingJRE implementation off of OpenJDKLike, which installs the kill agent. You could simply remove that line and it'll skip installing the kill agent. You might instead just add some JVM flags to tell the JVM to exit on OOME instead, you can do that with this or that.
Another thing installed by the OpenJDKLike implementation by default is the memory calculator. If that makes sense then you're all set. If it does not make sense for this JVM implementation, then you may want to consider removing that and adding JVM flags to tune the JVM's memory settings as well.
You do need to tune JVM memory settings with care because...
I'm not sure how this particular JVM operates so I can't really tell you what's best for some of these design decisions, however, happy to provide more context or answer any additional questions you might have. Just let me know.
Hi Daniel, Thanks for testing it.
I tried to debug the hang that you were talking about in the earlier comment. With a simple java pgrm, I did not see any errors. Tried to call all the methods similar to the ones that are being called in impl super::Action for PoolStats
# cat MXBeanMemory.java
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryUsage;
import java.util.List;
import java.util.concurrent.TimeUnit;
public class MXBeanMemory{
private static final int KB = 1024;
private static final int MB = KB * 1024;
private static final int GB = MB * 1024;
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 3; i++) {
MemoryMXBean memBean = ManagementFactory.getMemoryMXBean() ;
MemoryUsage heap = memBean.getHeapMemoryUsage();
System.err.println(String.format("Heap: Init: %d, Used: %d, Committed: %d, Max: %d", heap.getInit() / MB,
heap.getUsed() / MB, heap.getCommitted() / MB, heap.getMax() / MB));
TimeUnit.SECONDS.sleep(1);
}
/* For zulujdk8
Code Cache
Metaspace
Compressed Class Space
PS Eden Space
PS Survivor Space
PS Old Gen
For zulu11
CodeHeap 'non-nmethods'
Metaspace
CodeHeap 'profiled nmethods'
Compressed Class Space
G1 Eden Space
G1 Old Gen
G1 Survivor Space
CodeHeap 'non-profiled nmethods'
For zing 8 and 11
CodeCache
GenPauseless New Gen
GenPauseless Old Gen
GenPauseless Perm Gen
Native Heap
*/
for (int j = 0; j < 3; j++) {
System.out.println("\n--------------> ");
List<MemoryPoolMXBean> memoryPoolMXBeanList = ManagementFactory.getMemoryPoolMXBeans();
System.out.println("Total number of memory pools : " + memoryPoolMXBeanList.size());
for (int i = 0; i < memoryPoolMXBeanList.size(); i++) {
MemoryPoolMXBean memoryPoolMXBean = memoryPoolMXBeanList.get(i);
System.out.print(memoryPoolMXBean.getName());
MemoryUsage usage = memoryPoolMXBean.getUsage();
System.err.println(String.format(" Heap: Init: %d, Used: %d, Committed: %d, Max: %d", usage.getInit() / MB,
usage.getUsed() / MB, usage.getCommitted() / MB, usage.getMax() / MB));
TimeUnit.SECONDS.sleep(1);
}
}
}
}
Tried it with Zing 22.06 jdk8 and jdk11.
However when I tried to use rust/cargo. I see the following error
error[E0599]: no method named `call_static_object_method` found for struct `JNIEnv` in the current scope
--> src/lib.rs:9:33
|
9 | let memory_mxbean = jni_env.call_static_object_method("java/lang/management/ManagementFactory", get_memory_mxbean_method_id);
| ^^^^^^^^^^^^^^^^^^^^^^^^^ help: there is an associated function with a similar name: `call_static_method`
For more information about this error, try `rustc --explain E0599`.
My rust file looks like this -
cat src/lib.rs
use jni::JNIEnv;
use jni::sys::jint;
use jni::objects::JObject;
#[no_mangle]
pub extern "system" fn Java_ch_frankel_blog_rust_Main_doubleRust(jni_env: JNIEnv, _obj: JObject, x: jint) -> jint {
let mf_class = jni_env.find_class("java/lang/management/ManagementFactory");
let get_memory_mxbean_method_id = jni_env.get_static_method_id("java/lang/management/ManagementFactory", "getMemoryMXBean", "()Ljava/lang/management/MemoryMXBean;");
let memory_mxbean = jni_env.call_static_object_method("java/lang/management/ManagementFactory", get_memory_mxbean_method_id);
x * 2
}
#[no_mangle]
pub extern "system" fn Java_ch_frankel_blog_rust_Main_timesRust(env: JNIEnv, obj: JObject, x: jint) -> jint {
let state = env.get_field(obj, "state", "I");
state.unwrap().i().unwrap() * x
}
To ensure that we get this get this included in the next release we could skip installing the kill agent.
To ensure that we get this get this included in the next release we could skip installing the kill agent.
I'm fine with that. See my previous note, there's info about how you'd go about skipping it.
Removed installing java kill agent and memory calculator.
Hi Daniel, I built the jvmkill agent to see if I could reproduce the hang that you were talking about in an earlier comment
jvmkill agent sources from https://github.com/cloudfoundry/jvmkill
I was able to printMemoryUsage stats with no issues. Tried it with Azul's Prime jdk8, 11, 15 and 17. Pls let me know in case if you can provide any additional details
# /home/ubuntu/zingjdk8/zing22.06.0.0-3-jdk8.0.332-linux_x64/bin/java -agentpath:/home/ubuntu/jvmkill/jvmkill2/jvmkill/target/release/libjvmkill.so=printMemoryUsage=1 GenerateOOM
Resource Exhausted! (1/0)
>>> Memory Pools
Memory usage:
Heap memory: init 536870912, used 987758592, committed 4196401152, max 4196401152
Hint: Heap memory is over 95% full. To increase it, increase the container size.
Non-heap memory: init 183840768, used 10514530304, committed 10514530304, max 16791695360
Memory pool usage:
CodeCache: init 16777216, used 432640, committed 16777216, max 421527552
GenPauseless New Gen: init 2097152, used 2097152, committed 2097152, max 4196401152
GenPauseless Old Gen: init 0, used 811597824, committed 811597824, max 4196401152
GenPauseless Perm Gen: init 2097152, used 8388608, committed 8388608, max 4196401152
Native Heap: init 183840768, used 10514530304, committed 10514530304, max 16791695360
>>> Thread Dump
Full thread dump Zing 64-Bit Tiered VM (1.8.0_332-zing_22.06.0.0-b3-product-linux-X86_64 mixed mode [Thu Jul 14 15:19:56 UTC 2022]):
"CompileBrokerThread0" #643 daemon prio=5 os_prio=0 tid=0x620002800000 nid=0x53a3 [ JVM thread_state=_thread_blocked, locked by VM (w/poll advisory bit) waiting for incoming compile requests, polling bits: safep gpgc_clean_new ]
java.lang.Thread.State: RUNNABLE
"FalconCompThr1" #0 daemon prio=5 os_prio=0 tid=0x620002600000 nid=0x53a2 [ JVM thread_state=_thread_in_vm, locked by VM (w/poll advisory bit) waiting on VM lock 'CompileTask2_lock', polling bits: safep gpgc_clean_new ]
java.lang.Thread.State: RUNNABLE
"FalconCompThr0" #0 daemon prio=5 os_prio=0 tid=0x620002400000 nid=0x53a1 [ JVM thread_state=_thread_in_vm, locked by VM (w/poll advisory bit) waiting on VM lock 'CompileTask2_lock', polling bits: safep gpgc_clean_new ]
java.lang.Thread.State: RUNNABLE
"C1CompThr1" #0 daemon prio=5 os_prio=0 tid=0x620002200000 nid=0x53a0 [ JVM thread_state=_thread_in_vm, locked by VM (w/poll advisory bit) waiting on VM lock 'CompileTask1_lock', polling bits: safep gpgc_clean_new ]
java.lang.Thread.State: RUNNABLE
"C1CompThr0" #0 daemon prio=5 os_prio=0 tid=0x620002000000 nid=0x539f [ JVM thread_state=_thread_in_vm, locked by VM (w/poll advisory bit) waiting on VM lock 'CompileTask1_lock', polling bits: safep gpgc_clean_new ]
java.lang.Thread.State: RUNNABLE
"Service Thread" #483 daemon prio=5 os_prio=0 tid=0x620001e00000 nid=0x539e [ JVM thread_state=_thread_blocked, locked by VM (w/poll advisory bit) waiting on VM lock 'Service_lock', polling bits: safep gpgc_clean_new ]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #451 daemon prio=5 os_prio=0 tid=0x620001c00000 nid=0x539d [ JVM thread_state=_thread_in_vm, locked by VM (w/poll advisory bit) waiting on VM lock 'VMOperationRequest_lock', polling bits: safep ]
java.lang.Thread.State: RUNNABLE
"Object Monitor Thread" #419 daemon prio=5 os_prio=0 tid=0x620001a00000 nid=0x539c [ JVM thread_state=_thread_blocked, locked by VM (w/poll advisory bit) sleeping, polling bits: safep gpgc_clean_new ]
java.lang.Thread.State: RUNNABLE
"Surrogate Locker Thread" #387 daemon prio=5 os_prio=0 tid=0x620001800000 nid=0x539b [ JVM thread_state=_thread_in_vm, locked by VM (w/poll advisory bit) waiting on VM lock 'SLT_lock', polling bits: safep ]
java.lang.Thread.State: RUNNABLE
"Finalizer" #355 daemon prio=5 os_prio=0 tid=0x620001600000 nid=0x539a [ JVM thread_state=_thread_in_vm, locked by VM (w/poll advisory bit) waiting on VM lock 'java.lang.ref.ReferenceQueue$Lock', polling bits: safep gpgc_clean_new ]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000020000c11810> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x0000020000c11810> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)
"Reference Handler" #323 daemon prio=5 os_prio=0 tid=0x620001400000 nid=0x5399 [ JVM thread_state=_thread_in_vm, locked by VM (w/poll advisory bit) waiting on VM lock 'java.lang.ref.Reference$Lock', polling bits: safep gpgc_clean_new ]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000020000c10e70> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.processThroughPending(Reference.java:281)
- locked <0x0000020000c10e70> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference.tryHandlePending(Reference.java:204)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:177)
"main" #5 prio=5 os_prio=0 tid=0x620000000000 nid=0x538f [ JVM thread_state=_thread_in_native, locked by VM (w/poll advisory bit) running, polling bits: safep ]
java.lang.Thread.State: RUNNABLE
at GenerateOOM.generateOOM(GenerateOOM.java:20)
at GenerateOOM.main(GenerateOOM.java:11)
"VM Thread" os_prio=0 tid=0x620001200000 nid=0x5398
"GC task thr#0 (GenPauselessGC)" os_prio=0 tid=0x620000400000 nid=0x5391
"GC task thr#1 (GenPauselessGC)" os_prio=0 tid=0x620000600000 nid=0x5392
"GC task thr#0 (GenPauselessGC)" os_prio=0 tid=0x620000800000 nid=0x5393
"VM Periodic Task Thread" os_prio=0 tid=0x620002a00000 nid=0x53a4
JNI global references: 1123
Heap
GenPauselessHeap
Max 4002M, Capacity 4002M, Used 944M
Virtual Max: New 4002M, Old/Perm 4002M
NewCollector thread: 7 GC cycle: 17-12 state: ConcurrentMarking [0x5]
OldCollector thread: 8 GC cycle: 16-5 state: ConcurrentRefProcessing [0x7]
jvmkill is killing current process
Killed
As far as the killagent goes, that repo is in a weird state. A past developer on the project did a major rewrite on the main branch but that never actually got released. He's no longer working on the project and I don't know the state of that work.
If you want to recompile, you need to checkout the latest tagged release & compile from that. That in itself has problems because some of the dependencies are very old, so it doesn't build as is with recent Rust versions. I'm working on updating that and getting something we can build again. I just haven't had time to finish that work yet.
Just doing a second pass on testing with the kill agent. I'm using zing22.06.0.0-3-jdk8.0.332-linux_x64.tar.gz which I got from your trial downloads. It installs OK & and the app starts OK. I have a demo app that just adds new int[]
arrays until it exhausts the heap. This triggers the OOME as expected and the kill agent detects it, but that's where I was getting the hang. Normally the kill agent will kill the process which causes the CF app to be automatically restarted. In this case, it just hangs and never ends up killing the process, thus the app never restarts (the app is also no longer in a place where it responds to traffic).
2022-07-19T15:05:15.91-0400 [APP/PROC/WEB/0] OUT 2022-07-19 19:05:15.917 INFO 36 --- [io-8080-exec-10] c.v.m.s.jmw.MemoryWasterAPIController : Going to create 50000 junk of int[1000]...
2022-07-19T15:05:17.35-0400 [APP/PROC/WEB/0] ERR Resource exhaustion event: the JVM was unable to allocate memory from the heap.
2022-07-19T15:05:17.35-0400 [APP/PROC/WEB/0] ERR ResourceExhausted! (1/0)
2022-07-19T15:05:18.95-0400 [APP/PROC/WEB/0] OUT | Instance Count | Total Bytes | Class Name |
2022-07-19T15:05:18.95-0400 [APP/PROC/WEB/0] OUT | 141124 | 559637544 | [I |
2022-07-19T15:05:18.95-0400 [APP/PROC/WEB/0] OUT | 7139 | 6601568 | Ljava/lang/Class; ...
2022-07-19T15:05:19.08-0400 [APP/PROC/WEB/0] OUT | 119 | 6664 | Lorg/thymeleaf/engine/HTMLElementName; |
2022-07-19T15:05:19.08-0400 [APP/PROC/WEB/0] OUT | 166 | 6640 | Lorg/springframework/http/MediaType; |
2022-07-19T15:05:19.08-0400 [APP/PROC/WEB/0] OUT | 166 | 6640 | Lorg/springframework/core/type/classreading/SimpleMethodMetadataReadingVisitor$Source; |
2022-07-19T15:05:19.11-0400 [APP/PROC/WEB/0] OUT Memory usage: <------- ## Hung Here ##
For what it's worth, I'm getting this error also when the app starts up. It seems possibly related.
2022-07-19T15:00:22.13-0400 [APP/PROC/WEB/0] ERR java.lang.NullPointerException
2022-07-19T15:00:22.13-0400 [APP/PROC/WEB/0] ERR at io.micrometer.core.instrument.binder.jvm.JvmGcMetrics.countPoolSizeDelta(JvmGcMetrics.java:199)
2022-07-19T15:00:22.13-0400 [APP/PROC/WEB/0] ERR at io.micrometer.core.instrument.binder.jvm.JvmGcMetrics.lambda$bindTo$4(JvmGcMetrics.java:159)
2022-07-19T15:00:22.13-0400 [APP/PROC/WEB/0] ERR at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:156)
2022-07-19T15:00:22.13-0400 [APP/PROC/WEB/0] ERR at sun.management.GarbageCollectorImpl.createGCNotification(GarbageCollectorImpl.java:143)
2022-07-19T15:00:22.13-0400 [APP/PROC/WEB/0] ERR Exception in thread "Service Thread" java.lang.AssertionError: Error in invoking listener
2022-07-19T15:00:22.13-0400 [APP/PROC/WEB/0] ERR at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:159)
2022-07-19T15:00:22.13-0400 [APP/PROC/WEB/0] ERR at sun.management.GarbageCollectorImpl.createGCNotification(GarbageCollectorImpl.java:143)
I'm certainly not trying to cast any blame here. It's very likely that there is something with the killagent, not handling a return value correct or handling the state of things properly, or possibly something specific to the environment on CF where the app runs.
Given that you can support most of the functionality with JVM flags like -XX:+ExitOnOutOfMemoryError
and -XX:+HeapDumpOnOutOfMemoryError
/-XX:HeapDumpPath=
, it may not be worth the effort to debug this failure and instead go with these JVM flags instead, documenting the difference. You can add these flags in your subclass of OpenJDKLike, just like you can remove the killagent there.
Things that are remaining:
Thanks for testing it and your explanation. Coming back to the remaining changes. I am a little confused. Could you pls clarify and let me know if it doesn't look right. I intend to make the following changes to in response to items listed above For item 1 - lib//java_buildpack/jre/zing_jre.rb
class ZingJRE < OpenJDKLike
def sub_components(context)
[
OpenJDKLikeJre.new(sub_configuration_context(context, 'jre')
.merge(component_name: self.class.to_s.space_case)),
OpenJDKLikeSecurityProviders.new(context)
]
end
item 1 second changes - lib//java_buildpack/jre/open_jdk_like.rb
def sub_components(context)
[
JvmkillAgent.new(sub_configuration_context(context, 'jvmkill_agent')),
OpenJDKLikeJre.new(sub_configuration_context(context, 'jre')
.merge(component_name: self.class.to_s.space_case)),
OpenJDKLikeMemoryCalculator.new(sub_configuration_context(context, 'memory_calculator')),
OpenJDKLikeSecurityProviders.new(context)
]
end
For item2 - spec/java_buildpack/framework/java_opts_spec.rb
111 component.release
112 expect(java_opts).not_to include('$JAVA_OPTS')
113 expect(java_opts).to include('-XX:OnOutOfMemoryError=kill\ -9\ \%p')```
For item 1 - Yes, that looks right.
For item 2 - You need to override release
in your class, something like this
def release
super
@droplet.add_preformatted_options "-XX:+ExitOnOutOfMemoryError"
end
Using '-XX:OnOutOfMemoryError'
isn't recommended, not for killing the process anyway. It can cause problems when there's low/no memory available in a container cause it can't spawn the kill process. The option in my example is newer but available in Java 1.8+ (recent Java 1.8's anyway) and is meant for specifically this situation.
Again, you could try to add in heap dump options but that's probably better to do in a follow-up feature if someone requires it. To do that you would need to check for a bound volume service & set it up to write to that location. This is because you have to write the heap dump somewhere durable or it'll disappear when the app exits (file system in CF is ephemeral).
I have submitted changes as per your suggestion. Pls let me know if we need anything else.
I did some more testing and had a few minor things come up. I had to adjust how the command was being overridden, cause the way it was coded had resulted in the arguments being duplicated. I also cleaned up the docs & removed memory calc & jvm killagent. I'm going to merge your PR and follow after that with a small PR to polish things up.
As far as timing, I should be able to get this out very soon. We do have to cut a patch release first with updates for the July 2022 quarterly updates. You'll see that here any time now, and then I'll merge this PR and #957, then I'll cut a new minor release with this functionality. That might be Friday, but more likely will be Monday.
Thanks again for the PR!
Thanks for the update and the ETA for the patch. Can we get more details on the release/patch number once it is available
Thanks again for the PR! Final update on timing. We have two more PRs to merge & are targeting the release on Friday.