Implementation of JFR perf data for BSD

bsdkurt commented 5 years ago

I pushed some minor changes from your review and one other duplicated line of code reduced.

battleblow commented 5 years ago

I pushed some minor changes from your review and one other duplicated line of code reduced.

Thanks! I pushed a change so that the file compiles on NetBSD

battleblow commented 5 years ago

I'd hoped to get to this sooner today, but it took a little longer than expected. I've done some more testing, particularly of the context switch and cpu load functionality, on FreeBSD (running on bare metal) and NetBSD (a VM). For that functionality, it looks like the code is doing what it intends to do and seems to match the MacOS X implementation in terms of what it does.

That said, I have some concerns that what the code is doing is fulfilling the contract of CPUPerformance() class definition. Unfortunately that is not documented well (at all) in the class header itself, and the MacOS X implementation looks at odds with the Linux and Solaris implementations. Some code links:

https://github.com/AdoptOpenJDK/openjdk-jdk11u/blob/master/src/hotspot/os/linux/os_perf_linux.cpp https://github.com/AdoptOpenJDK/openjdk-jdk11u/blob/master/src/hotspot/os/solaris/os_perf_solaris.cpp

First off, a minor concern. In the Linux version of cpu_loads_process(), the total load (psystemTotalLoad) is explicitly the sum of the user and kernel JVM load. For our implementation, these are calculated separately and the results I get don't even get close to the total load being the sum of user and kernel load. I think this is partly because we're calculating different values from Linux (see the next section), but also partly this is a matter of scale. Even when the JVM process is the dominant one on the system the user and kernel load are orders of magnitude smaller than the total.

For my major concern, I'll go through the load/context switch methods one by one.

context_switch_rate() - I think we're doing this correctly.
cpu_load_total_process() - On Linux and Solaris I believe the total CPU load generated by the JVM process itself is what the implementation calculates. On *BSD, we're calculating the overall system CPU load.
cpu_loads_process() - On Linux the same calculation is used as for cpu_load_total_process(), but the split user and system (kernel) load is also set as well as the total. On Solaris a different calculation is done, and I'm not sure it is equivalent, but it does again appear to be looking explicitly at the load from the JVM process itself and the total is the sum of user + system for the process. On *BSD, we're using the overall system CPU load and then separately looking at the user and system cpu load for the JVM process.
cpu_load() - Not implemented on *BSD. FWIW, on Solaris and Linux it looks like this does calculate the overall system load on a per CPU basis (i.e., what we're doing in cpu_load_total_process(), but split out on a per CPU basis). FWIW, on FreeBSD this could be done with the kern.cp_times sysctl (note the s on the end), which provides CPU load on a per CPU basis (kern.cp_time is the total over all CPUs). I haven't checked for an equivalent on NetBSD or OpenBSD yet.

I'm going to do some more testing tomorrow on MacOS X and try and do some on a Linux box too (although that will require a bit of set up).

Please let me know if you agree/disagree with the characterisation of what is going on in the various code bases above. If it is accurate, then we have a decision to make in regards of whether we should/can switch from following MacOS X to match Linux/Solaris. FWIW, it looks like the AIX version closely follows Linux. The Windows version I've only just glanced at, but I think its intention is the same as Linux/Solaris.

I do wish that os_perf.hpp documented even briefly the expectations of what each method in CPUPerformance was intended to do.

bsdkurt commented 5 years ago

I agree with your analysis and also agree we should follow the Linux implementation in terms of what the functions return. Here are some ideas for implementation on *BSD:

cpu_loads_process(double* pjvmUserLoad, double* pjvmKernelLoad, double* psystemTotalLoad)

psystemTotalLoad:(user + kernel time (including interrupt time + spin time)) / total time
pjvmUserLoad: user time for jvm process only / total time
pjvmKernelLoad: system time for jvm process only / total time

cpu_load_total_process(double* cpu_load)

cpu_load: (user time for jvm process only + system time for jvm process only) / total time

cpu_load(int which_logical_cpu, double* cpu_load)

cpu_load: same as psystemTotalLoad but for a given cpu

Something along the lines of this I think would be right for *BSD for the basis of psystemTotalLoad:

uint64_t total_ticks = 0;
for (int i=0; i<CPUSTATES;i++)
  total_ticks += cpu_load_info[i];

uint64_t used_ticks = total_ticks - cpu_load_info[CP_IDLE];

For the jvm process values we could use getrusage(RUSAGE_SELF) and convert user time and system time back to ticks so we are in the same units at total_ticks. *BSD has {CTL_KERN, KERN_CLOCKRATE} sysctl with stathz for converting back to ticks.

battleblow commented 5 years ago

I agree with your analysis and also agree we should follow the Linux implementation in terms of what the functions return. Here are some ideas for implementation on *BSD:

cpu_loads_process(double* pjvmUserLoad, double* pjvmKernelLoad, double* psystemTotalLoad)
* `psystemTotalLoad`:(user + kernel time (including interrupt time + spin time)) / total time

* `pjvmUserLoad`: user time for jvm process only / total time

* `pjvmKernelLoad`: system time for jvm process only / total time
cpu_load_total_process(double* cpu_load)
* `cpu_load`: (user time for jvm process only + system time for jvm process only) / total time
cpu_load(int which_logical_cpu, double* cpu_load)
* `cpu_load`: same as `psystemTotalLoad` but for a given cpu
Something along the lines of this I think would be right for *BSD for the basis of psystemTotalLoad:
uint64_t total_ticks = 0;
for (int i=0; i<CPUSTATES;i++)
  total_ticks += cpu_load_info[i];

uint64_t used_ticks = total_ticks - cpu_load_info[CP_IDLE];
For the jvm process values we could use getrusage(RUSAGE_SELF) and convert user time and system time back to ticks so we are in the same units at total_ticks. *BSD has {CTL_KERN, KERN_CLOCKRATE} sysctl with stathz for converting back to ticks.

I think that for psystemTotalLoad we should just sum the user and kernel values the same as Linux does. My read of the Linux code is that this is the total load created by the JVM on the system, not the overall total for the system. The overall totals for the system are what is calculated in cpu_load().

bsdkurt commented 5 years ago

I think that for psystemTotalLoad we should just sum the user and kernel values the same as Linux does. My read of the Linux code is that this is the total load created by the JVM on the system, not the overall total for the system. The overall totals for the system are what is calculated in cpu_load().

I'm not sure if we are saying the same thing but in different ways or if we have drawn different conclusions from reading the Linux implementation. :-)

psystemTotalLoad gets it value from cpu_load(-1, &t); -> get_cpu_load(which_logical_cpu, &_counters, &s, CPU_LOAD_GLOBAL); -> os::Linux::get_tick_information(pticks, which_logical_cpu)

os::Linux::get_tick_information() fills in the values like this:

  pticks->used       = userTicks + niceTicks;
  pticks->usedKernel = systemTicks + irqTicks + sirqTicks;
  pticks->total      = userTicks + niceTicks + systemTicks + idleTicks +
                       iowTicks + irqTicks + sirqTicks + stealTicks + guestNiceTicks;

get_cpu_load(which_logical_cpu, &_counters, &s, CPU_LOAD_GLOBAL); uses these three values to create user_load and pkernelLoad and return them back to cpu_load which adds them together for psystemTotalLoad.

This leads me to believe that we can't just sum up user + system time from KERN_PROC_ALL results to get an equivalent total value since idle and interrupt time will be missing. The total from get_tick_information() is important since it is used as the denominator for psystemTotalLoad, pjvmUserLoad, and pjvmKernelLoad. get_jvm_ticks() calls os::Linux::get_tick_information() and then overwrites used and usedKernel but keeps the total.

I think the closest implementation we can achieve on *BSD is to use kern.cp_time to get the total value that is used as our denominator for psystemTotalLoad, pjvmUserLoad, and pjvmKernelLoad. Which means we need to convert from jvm user time and system time to ticks or vice versa.

Did I make a mistake or are you coming up with different conclusions?

battleblow commented 5 years ago

I think that for psystemTotalLoad we should just sum the user and kernel values the same as Linux does. My read of the Linux code is that this is the total load created by the JVM on the system, not the overall total for the system. The overall totals for the system are what is calculated in cpu_load().

I'm not sure if we are saying the same thing but in different ways or if we have drawn different conclusions from reading the Linux implementation. :-)

psystemTotalLoad gets it value from cpu_load(-1, &t); -> get_cpu_load(which_logical_cpu, &_counters, &s, CPU_LOAD_GLOBAL); -> os::Linux::get_tick_information(pticks, which_logical_cpu)

os::Linux::get_tick_information() fills in the values like this:
  pticks->used       = userTicks + niceTicks;
  pticks->usedKernel = systemTicks + irqTicks + sirqTicks;
  pticks->total      = userTicks + niceTicks + systemTicks + idleTicks +
                       iowTicks + irqTicks + sirqTicks + stealTicks + guestNiceTicks;
get_cpu_load(which_logical_cpu, &_counters, &s, CPU_LOAD_GLOBAL); uses these three values to create user_load and pkernelLoad and return them back to cpu_load which adds them together for psystemTotalLoad.

This leads me to believe that we can't just sum up user + system time from KERN_PROC_ALL results to get an equivalent total value since idle and interrupt time will be missing. The total from get_tick_information() is important since it is used as the denominator for psystemTotalLoad, pjvmUserLoad, and pjvmKernelLoad. get_jvm_ticks() calls os::Linux::get_tick_information() and then overwrites used and usedKernel but keeps the total.

I think the closest implementation we can achieve on *BSD is to use kern.cp_time to get the total value that is used as our denominator for psystemTotalLoad, pjvmUserLoad, and pjvmKernelLoad. Which means we need to convert from jvm user time and system time to ticks or vice versa.

Did I make a mistake or are you coming up with different conclusions?

I made a mistake :)

I skimmed cpu_loads_process() and misinterpreted the portion where the total gets capped at u+s and thought it was doing the same as cpu_load_total_process() rather than just being a cap.

I agree with your implementation proposal.

I'm leaving on a brief trip Sunday through Wednesday, not sure if I'll have a lot of time in between now and when I leave to work on this. I'll be completely cut off from connectivity most of this time. I'm not sure if you had any plans to work on this in that time frame but just wanted to provide a heads up.

bsdkurt commented 5 years ago

No worries. The Linux implementation seems a bit overly complex for what it does. I'm not sure when I'll get back to it as well. I'll pop a comment in here if I start coding so we don't duplicate our work. Have a nice trip.

bsdkurt commented 5 years ago

I reverted cpu_load_total_process() and cpu_loads_process() directly in the bsd-port-os-perf-bsd feature branch in preparation for new implementation discussed here.

bsdkurt commented 5 years ago

@battleblow I believe I incorporated all of your review feedback from #39 in my last commit to bsd-port-os-perf-bsd. github's context for the diff makes it look a bit convoluted. Please take a look the the file when compared to master branch for a more reasonable view of the changes. git diff master -- src/hotspot/os/bsd/os_perf_bsd.cpp

battleblow commented 5 years ago

I had some minor comments, but no objections to merging it

I meant to select approve when I finished the review.

battleblow / openjdk-jdk11u

Implementation of JFR perf data for BSD #37