Closed Sidong-Wei closed 9 months ago
fyi @keithc-ca
The behavior is consistent with the specification at [1]:
If the Java Virtual Machine recent CPU usage is not available, the method returns a negative value.
@keithc-ca Thanks for your reply, yeah I understand that the negative return value is allowed, and also we know the first call probably could not retrieve the correct recent cpu usage, so the behavior is expected. I guess my point is that since we already know the first call will always return -1 which is not implying any potential issue with the system, could we sort of differentiating the negative return values between the first time and later calls? i.e., the first call return -1 is normal so we could let it skip, but later calls returning -1 means there really might be something wrong there. If the first -1 could be filtered into a non-negative value (like 0 in OpenJDK), then it probably would not trigger any warnings or errors on the developer side, while the following negative values could keep being monitored. Please let me know whether you think that will make sense, thank you.
@Sidong-Wei It's not clear to me whether you're suggesting changes to your application (ignore the first -1 returned), or suggesting that OpenJ9 return 0 instead of -1 from the first call. The latter does seem reasonable to me, but we'll have to consider existing use cases (and tests) that might be affected by such a change.
@keithc-ca Yeah I also prefer the latter option a bit more. Changes to the application would definitely work and it could be a workaround for the issue, but I think if it could be resolved at the source then it would be easier for downstream applications to just monitor negative returns normally. If you would like to take this latter option into consideration, that will be good enough for me now, since it is not a critical issue and at least there exists a workaround for it. Thanks.
Controlling the behavior via a command line option is one way to avoid compatibility problems.
@pshipton, I assume you mean the option would restore the old (current) behavior, otherwise the default would still be inconsistent with openjdk? Got a proposal for the option name?
The behavior of getSystemCpuLoad()
should mirror that of getProcessCpuLoad()
(either both or neither should return -1 the first time either is called).
For compatibility I suggest the default behavior remain the same, although we can change it on a release boundary (Java 18 or maybe Java 17 if we do it soon) and document the change. -XX:[+/-]CpuLoadCompatibility
I'm wondering if I could get an update on where this issue stands? Is there a plan to implement it in a particular release?
@tajila question about scheduling this.
Just wanted to check in on the status of this issue again.
Is there a plan to implement it in a particular release?
It's in the backlog, so no.
@thallium Please take a look at this
@thallium Please take a look at this
@TobiAjila Does that mean I need to implement the -XX:[+/-]CpuLoadCompatibility
flag?
@TobiAjila Does that mean I need to implement the -XX:[+/-]CpuLoadCompatibility flag?
Yes, and you'll need to change the behaviour of getProcessCpuLoad
@TobiAjila Does that mean I need to implement the -XX:[+/-]CpuLoadCompatibility flag?
Yes, and you'll need to change the behaviour of
getProcessCpuLoad
@TobiAjila As suggested in https://github.com/eclipse-openj9/openj9/issues/13389#issuecomment-920189987, do we also want to change behaviour of getSystemCpuLoad()
? getSystemCpuLoad()
calls omrsysinfo_get_CPU_load()
which returns OMRPORT_ERROR_OPFAILED
on the first call so we can't tell if it's the first call on the OpenJ9 side. Should I change it to return another error code (maybe OMRPORT_ERROR_INVALID
) instead?
@TobiAjila As suggested in https://github.com/eclipse-openj9/openj9/issues/13389#issuecomment-920189987, do we also want to change behaviour of getSystemCpuLoad()?
Yes, both should mirror each other.
so we can't tell if it's the first call on the OpenJ9 side. Should I change it to return another error code (maybe OMRPORT_ERROR_INVALID) instead?
I think its safe to assume that the only the JVM will be calling this. Its possible that someone writes a native and calls before using the MXBean api, but in that case I think we would behave similar to the RI. We can verifiy that to be sure.
In any case, the main change here is that, instead of returning -1, we return zero. We are not tracking how many times the call has been made.
In any case, the main change here is that, instead of returning -1, we return zero. We are not tracking how many times the call has been made.
I'd assume you mean returning 0 from omrsysinfo_get_CPU_load()
, but then what about the compatibility flag? If we return 0, we can't tell if it's the first-time call or the CPU load is actually 0?
I'd assume you mean returning 0 from omrsysinfo_get_CPU_load(), but then what about the compatibility flag? If we return 0, we can't tell if it's the first-time call or the CPU load is actually 0?
omrsysinfo_get_CPU_load()
implicitly tracks if it is being called the first time based on whether oldestCPUTime
is set. We may not need to make any changes to omrsysinfo_get_CPU_load
as long as we can guarantee that the only time OMRPORT_ERROR_OPFAILED
is returned is when omrsysinfo_get_CPU_load
is called for the first time. If that is not the case then we will need a new error code to return instead.
The main changes will be made to Java_com_ibm_lang_management_internal_ExtendedOperatingSystemMXBeanImpl_getProcessVirtualMemorySizeImpl
. If the error code for first call is returned, then we check the option which will tell us whether to return -1 or 0.
Java -version output
Summary of problem
The
getProcessCpuLoad()
method of classcom.sun.management.OperatingSystemMXBean
always return -1 (invaild ret value) in the first call, while in the following calls it will get the correct cpu load value. I tried to call the same function using OpenJDK11 and AdoptOpenJDK11 with Hotspot, and it will return 0 with these JDKs which is within the valid ret value range. So I expect Openj9 should also have the same behavior for the first call. Here is a simple test code that could reproduce the issue:Diagnostic files
The output I got with different JDK variants is:
I also tried the previous stable release (11.0.11+9) and the current
openj9
branch and the output is similar.