eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

getProcessCpuLoad() always return -1 in first call on jdk11 #13389

Closed Sidong-Wei closed 9 months ago

Sidong-Wei commented 3 years ago

Java -version output

openjdk 11.0.12 2021-07-20
IBM Semeru Runtime Open Edition 11.0.12.0 (build 11.0.12+7)
Eclipse OpenJ9 VM 11.0.12.0 (build openj9-0.27.0, JRE 11 Linux amd64-64-Bit Compressed References 20210729_228 (JIT enabled, AOT enabled)
OpenJ9   - 1851b0074
OMR      - 9db1c870d
JCL      - 21849e2ca0 based on jdk-11.0.12+7)

Summary of problem

The getProcessCpuLoad() method of class com.sun.management.OperatingSystemMXBean always return -1 (invaild ret value) in the first call, while in the following calls it will get the correct cpu load value. I tried to call the same function using OpenJDK11 and AdoptOpenJDK11 with Hotspot, and it will return 0 with these JDKs which is within the valid ret value range. So I expect Openj9 should also have the same behavior for the first call. Here is a simple test code that could reproduce the issue:

import java.lang.management.ManagementFactory;
import com.sun.management.OperatingSystemMXBean;

class Test{
  public static void main(String[] args) {
    while (true) {
        try
        {
            Thread.sleep(10000);
        }
        catch(InterruptedException ex)
        {
            Thread.currentThread().interrupt();
        }
        double cpuUtil = ((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getProcessCpuLoad();
        System.out.println(cpuUtil);
    }
  }
}

Diagnostic files

The output I got with different JDK variants is:

sidong@sidong-u18-x1:~$ java --version
openjdk 11.0.12 2021-07-20
OpenJDK Runtime Environment Temurin-11.0.12+7 (build 11.0.12+7)
OpenJDK 64-Bit Server VM Temurin-11.0.12+7 (build 11.0.12+7, mixed mode)
sidong@sidong-u18-x1:~$ java Test
0.0
0.001002004008016032
5.00751126690035E-4
0.0010035122930255895
5.00751126690035E-4
5.015045135406219E-4
sidong@sidong-u18-x1:~$ java --version
openjdk 11.0.12 2021-07-20
IBM Semeru Runtime Open Edition 11.0.12.0 (build 11.0.12+7)
Eclipse OpenJ9 VM 11.0.12.0 (build openj9-0.27.0, JRE 11 Linux amd64-64-Bit Compressed References 20210729_228 (JIT enabled, AOT enabled)
OpenJ9   - 1851b0074
OMR      - 9db1c870d
JCL      - 21849e2ca0 based on jdk-11.0.12+7)
sidong@sidong-u18-x1:~$ java Test
-1.0
0.0031686181725441823
2.0872289188482128E-4
2.556149850413811E-4
3.1896885482841735E-4
1.4443471779581417E-4
2.252603035480593E-4

I also tried the previous stable release (11.0.11+9) and the current openj9 branch and the output is similar.

DanHeidinga commented 3 years ago

fyi @keithc-ca

keithc-ca commented 3 years ago

The behavior is consistent with the specification at [1]:

If the Java Virtual Machine recent CPU usage is not available, the method returns a negative value.

[1] https://docs.oracle.com/en/java/javase/11/docs/api/jdk.management/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad()

Sidong-Wei commented 3 years ago

@keithc-ca Thanks for your reply, yeah I understand that the negative return value is allowed, and also we know the first call probably could not retrieve the correct recent cpu usage, so the behavior is expected. I guess my point is that since we already know the first call will always return -1 which is not implying any potential issue with the system, could we sort of differentiating the negative return values between the first time and later calls? i.e., the first call return -1 is normal so we could let it skip, but later calls returning -1 means there really might be something wrong there. If the first -1 could be filtered into a non-negative value (like 0 in OpenJDK), then it probably would not trigger any warnings or errors on the developer side, while the following negative values could keep being monitored. Please let me know whether you think that will make sense, thank you.

keithc-ca commented 3 years ago

@Sidong-Wei It's not clear to me whether you're suggesting changes to your application (ignore the first -1 returned), or suggesting that OpenJ9 return 0 instead of -1 from the first call. The latter does seem reasonable to me, but we'll have to consider existing use cases (and tests) that might be affected by such a change.

Sidong-Wei commented 3 years ago

@keithc-ca Yeah I also prefer the latter option a bit more. Changes to the application would definitely work and it could be a workaround for the issue, but I think if it could be resolved at the source then it would be easier for downstream applications to just monitor negative returns normally. If you would like to take this latter option into consideration, that will be good enough for me now, since it is not a critical issue and at least there exists a workaround for it. Thanks.

pshipton commented 3 years ago

Controlling the behavior via a command line option is one way to avoid compatibility problems.

keithc-ca commented 3 years ago

@pshipton, I assume you mean the option would restore the old (current) behavior, otherwise the default would still be inconsistent with openjdk? Got a proposal for the option name?

The behavior of getSystemCpuLoad() should mirror that of getProcessCpuLoad() (either both or neither should return -1 the first time either is called).

pshipton commented 3 years ago

For compatibility I suggest the default behavior remain the same, although we can change it on a release boundary (Java 18 or maybe Java 17 if we do it soon) and document the change. -XX:[+/-]CpuLoadCompatibility

jonathan-albrecht-ibm commented 2 years ago

I'm wondering if I could get an update on where this issue stands? Is there a plan to implement it in a particular release?

pshipton commented 2 years ago

@tajila question about scheduling this.

jonathan-albrecht-ibm commented 1 year ago

Just wanted to check in on the status of this issue again.

pshipton commented 1 year ago

Is there a plan to implement it in a particular release?

It's in the backlog, so no.

tajila commented 10 months ago

@thallium Please take a look at this

thallium commented 10 months ago

@thallium Please take a look at this

@TobiAjila Does that mean I need to implement the -XX:[+/-]CpuLoadCompatibility flag?

tajila commented 10 months ago

@TobiAjila Does that mean I need to implement the -XX:[+/-]CpuLoadCompatibility flag?

Yes, and you'll need to change the behaviour of getProcessCpuLoad

thallium commented 10 months ago

@TobiAjila Does that mean I need to implement the -XX:[+/-]CpuLoadCompatibility flag?

Yes, and you'll need to change the behaviour of getProcessCpuLoad

@TobiAjila As suggested in https://github.com/eclipse-openj9/openj9/issues/13389#issuecomment-920189987, do we also want to change behaviour of getSystemCpuLoad()? getSystemCpuLoad() calls omrsysinfo_get_CPU_load() which returns OMRPORT_ERROR_OPFAILED on the first call so we can't tell if it's the first call on the OpenJ9 side. Should I change it to return another error code (maybe OMRPORT_ERROR_INVALID) instead?

tajila commented 10 months ago

@TobiAjila As suggested in https://github.com/eclipse-openj9/openj9/issues/13389#issuecomment-920189987, do we also want to change behaviour of getSystemCpuLoad()?

Yes, both should mirror each other.

so we can't tell if it's the first call on the OpenJ9 side. Should I change it to return another error code (maybe OMRPORT_ERROR_INVALID) instead?

I think its safe to assume that the only the JVM will be calling this. Its possible that someone writes a native and calls before using the MXBean api, but in that case I think we would behave similar to the RI. We can verifiy that to be sure.

In any case, the main change here is that, instead of returning -1, we return zero. We are not tracking how many times the call has been made.

thallium commented 10 months ago

In any case, the main change here is that, instead of returning -1, we return zero. We are not tracking how many times the call has been made.

I'd assume you mean returning 0 from omrsysinfo_get_CPU_load(), but then what about the compatibility flag? If we return 0, we can't tell if it's the first-time call or the CPU load is actually 0?

tajila commented 10 months ago

I'd assume you mean returning 0 from omrsysinfo_get_CPU_load(), but then what about the compatibility flag? If we return 0, we can't tell if it's the first-time call or the CPU load is actually 0?

omrsysinfo_get_CPU_load() implicitly tracks if it is being called the first time based on whether oldestCPUTime is set. We may not need to make any changes to omrsysinfo_get_CPU_load as long as we can guarantee that the only time OMRPORT_ERROR_OPFAILED is returned is when omrsysinfo_get_CPU_load is called for the first time. If that is not the case then we will need a new error code to return instead.

The main changes will be made to Java_com_ibm_lang_management_internal_ExtendedOperatingSystemMXBeanImpl_getProcessVirtualMemorySizeImpl. If the error code for first call is returned, then we check the option which will tell us whether to return -1 or 0.