eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

Display CPU Governor information in javacore #2838

Open bharathappali opened 6 years ago

bharathappali commented 6 years ago

When running in containers with CPU quota set, performance of the application may be adversely affected if CPU frequency governor is set to "powersave".

As an example, compare application running on 4 CPUs with 100% quota vs running it on 8 CPUs with 50% quota. If the CPU frequency governor is set to "powersave", performance in latter case is very poor even though the application has 4 "effective" cpus.

We have observed this during internal testing on DayTrader 3. We ran DayTrader 3 with following cases (Frequency Governor - powersave(default)):

- 4 cpus, 100% quota
- 8 cpus, 50% quota

Throughput comparison for the above cases:

Case            Throughput
4Cpu100Quota        9248.22
8Cpu50Quota         7771.04

We also asked about this on Ubuntu forum here: https://askubuntu.com/questions/1007656/behavior-of-powersave-freq-governor-when-cpu-quota-is-set/1007731#1007731

When we came to know that its due to the powersave frequency governor. we ran again with performance frequency governor and saw the difference in throughput

Case            Throughput
4Cpu100Quota        8255.2
8Cpu50Quota         8355.69

It appears this drop in performance is expected with "powersave" governor. If thats the case then it would be useful to capture the CPU frequency governor being used in javacore which can be helpful in dealing with performance problems.

The Javacore CPU information section could be delivering this extended feature and might look something like :

1CICPUINFO  CPU Information
NULL        ------------------------------------------------------------------------
2CIPHYSCPU      Physical CPUs: 4
2CIONLNCPU      Online CPUs: 4
2CIBOUNDCPU     Bound CPUs: 4
2CIACTIVECPU    Active CPUs: 0
2CITARGETCPU    Target CPUs: 4
2CICPUGVNR      CPU Governors
2CICPUGVNR0     CPU0 : powersave
2CICPUGVNR1             CPU1 : powersave
2CICPUGVNR2             CPU2 : powersave
2CICPUGVNR3             CPU3 : powersave
                    .
                    .
                    .
                    .
...

Please kindly provide your views on it

FYI @ashu-mehra @dinogun @deesebas

bharathappali commented 6 years ago

@pshipton @DanHeidinga Can i get your views on it please ?

DanHeidinga commented 6 years ago

@bharathappali I can see how having this info would be useful. Here's a couple of questions to help me understand the scope of the this change and the broader implications:

bharathappali commented 6 years ago

@DanHeidinga Thank you for posting those questions. Which are quite helpful and important.

How expensive is this information to collect? Is it collected once and stored for the duration of the run?

Its a moderately expensive operation as its processed every time the javacore dump is generated. We read and write to javacore directly when its processed will not be storing it.

How much info is being added to the javacore? 1 line per cpu? This may be unwieldy when there are large numbers of CPUs.

Initially i thought to add 1 line per cpu. But what you said was right it doesn't look good in a case where we have many CPU's. I have come up with different ways listed below for representing them which take less lines to be displayed in javacore.

Array model representation (for each governor) :

2CICPUGVNR         CPU Governors [Governor - CPU's]
2CICPUGVRPS        Powersave - [0,1,2,3,8,9,10,11]
2CICPUGVRPF        Performance - [4,5,6,7,12,13,14,15]

More simplified with ranges it would be something like :

2CICPUGVNR         CPU Governors [Governor - CPU's]
2CICPUGVRPS        Powersave - [(0-3),(8-11)]
2CICPUGVRPF        Performance - [(4-7),(12-15)]

Short form representation with 16 CPU's per line

2CICPUGVNR         CPU Governors [Powersave - PS, Performance - PF]
2CICPUIDS     0     1     2     3     4     5     6     7     8     9     10    11    12    13    14    15
2CICPUGVRS    PS    PS    PS    PS    PF    PF    PF    PF    PS    PS    PS    PS    PF    PF    PF    PF 

Does this info only get included when run in cgroups or is it more widely applicable?

Its widely applicable irrespective of cgroups. If we run in cgroups with cpu restrictions then will be getting the governor info of those specific CPU's

Are there other governors than powersave?

yes, In linux there are 6 cpu frequency governors.

Note: I have got the info of cpu frequency governors from this link from section 2.

2.   Governors In the Linux Kernel
2.1  Performance
2.2  Powersave
2.3  Userspace
2.4  Ondemand
2.5  Conservative
2.6  Schedutil

Please kindly provide your views on the representation of the CPU Governor info in the javacore.

DanHeidinga commented 6 years ago

Its a moderately expensive operation as its processed every time the javacore dump is generated. We read and write to javacore directly when its processed will not be storing it.

Can the frequency governor change at runtime? Or is it a static assignment? I'd rather cache the info than read it over and over.... depending on how expensive it is to read.

Is there a reason to only record the two kinds (performance / powersave) when there are 4 other kinds?

I think we need to figure out what data we want before we pick the format.

bharathappali commented 5 years ago

@DanHeidinga I'm sorry for delayed reply. I haven't gone through it for a while.

Can the frequency governor change at runtime?

Yeah there is a possibility to change it in runtime.

I'd rather cache the info than read it over and over.... depending on how expensive it is to read.

its just for a one time use when javacore is generated. As there is no other mechanism/service using this info I'm thinking that caching it would not be that useful from a memory allocation perspective. I might be wrong, I'm sorry if i was.

Is there a reason to only record the two kinds (performance / powersave) when there are 4 other kinds?

My system (RHEL 7.5) currently showing only those two in available governors for my cpus (got the info from /sys/devices/system/cpu/cpu<0-3>/cpufreq/scaling_available_governors) so just for representational purposes i have described only two. But yes will be adding all the kinds available in the JAVACORE.

SueChaplain commented 5 years ago

@bharathappali Please add a tag for the doc changes needed (doc:externals) and open an issue at the openj9-docs repository: https://github.com/eclipse/openj9-docs/issues/new?template=new-documentation-change.md

We will then be able to add this to our 0.12.0 release list. Thanks!

bharathappali commented 5 years ago

@SueChaplain Thanks for pointing it. Created the issue openj9-docs/issues/136. Will be updating it accordingly.

pshipton commented 5 years ago

Given where this is at, and the upcoming holidays and absences, I'm guessing this won't make the 0.12 release. I've moved it out of the 0.12 milestone and into the 0.14 milestone to reduce the size of the 0.12 milestone. If it does happen to make it into the 0.12 release, we can fix the target milestone.

bharathappali commented 5 years ago

@DanHeidinga As the OpenJ9 PR #3348 makes changes to add CPU Governors to javacore with a representation [Governor - CPU's]. I would like to know your views on that representation format. Thanks

ashu-mehra commented 5 years ago

@bharathappali Looking at PRs #3348 and #4176 , I am more inclined towards #3348 where cpu governor information is obtained from port library, whereas #4176 just puts up everything in a function and is difficult to comprehend and maintain.

I also suggest to use bits for storing list of cpus for each governor. In fact having port library apis for maintaining cpu sets would be a good addition I feel. For example, I have come up with following set of APIs catering to cpu governor information:

+typedef struct OMRCpuSet {
+   union {
+       cpu_set_t cpuset;
+       struct {
+           cpu_set_t *cpuset;
+           uint64_t size;
+       } d_cpuset;
+   } set;
+   BOOLEAN dynamic;    
+} OMRCpuSet;
+
 +void omrsysinfo_get_cpu_affinity(OMRCpuSet *cpuset) {
+#if defined(LINUX) && !defined(OMRZTPF)
+   int32_t size = sizeof(cpuset->set.cpuset); /* Size in bytes */
+   int32_t rc = sched_getaffinity(0, size, &cpuset->set.cpuset); /* pid = 0 returns mask of calling process */
+   if (rc != 0) {
+       if (EINVAL == errno) {
+           /* Too many CPUs for the fixed cpu_set_t structure */
+           int32_t numCPUs = sysconf(_SC_NPROCESSORS_CONF);
+           cpuset->set.d_cpuset.cpuset = CPU_ALLOC(numCPUs);
+           if (NULL != allocatedCpuSet) {
+               cpuset->set.d_cpuset.size = CPU_ALLOC_SIZE(numCPUs);
+               CPU_ZERO_S(cpuset->set.d_cpuset.size, cpuset->set.d_cpuset.cpuset);
+               rc = sched_getaffinity(mainProcess, cpuset->set.d_cpuset.size, cpuset->set.d_cpuset.cpuset);
+               if (0 != rc) {
+                   return OMRPORT_ERROR_XXX;
+               }
+               cpuset->dynamic = TRUE;
+           }
+       }
+   }
+#else
+
+#endif 
+}
+
+void omrsysinfo_free_cpu_set(OMRCpuSet *cpuset) {
+#if defined(LINUX) && !defined(OMRZTPF)
+   if (cpuset->dynamic) {
+       CPU_FREE(cpuset->set.d_cpuset.cpuset)
+   }
+#endif
+}
+
+void omrsysinfo_add_cpuset(OMRCpuSet *cpuset, int cpu) {
+#if defined(LINUX) && !defined(OMRZTPF)
+   if (cpuset->dynamic) {
+       if (cpu >= (CPU_ALLOC_SIZE(cpuset->set.d_cpuset.cpuset) * 8)) {
+           CPU_FREE(cpuset->set.d_cpuset.cpuset);
+           cpuset->set.d_cpuset.cpuset = CPU_ALLOC(cpu+1);
+       }
+       CPU_SET(cpu, cpuset->set.d_cpuset.cpuset);
+   } else {
+       if (cpu >= sizeof(cpuset->set.cpuset)) {
+           cpuset->set.d_cpuset.cpuset = CPU_ALLOC(cpu+1);
+           cpuset->dynamic = TRUE;
+           CPU_SET(cpu, cpuset->set.d_cpuset.cpuset);
+       } else {
+           CPU_SET(cpu, &cpuset->set.cpuset);
+       }
+   }
+#endif
+}
+
+void omrsysinfo_isset_cpuset(OMRCpuSet *cpuset, int cpu) {
+#if defined(LINUX) && !defined(OMRZTPF)
+   if (cpuset->dynamic) {
+       CPU_ISSET(cpu, cpuset->set.d_cpuset.cpuset);
+   } else {
+       CPU_ISSET(cpu, &cpuset->set.cpuset);
+   }
+#endif
+}
+
+int omrsysinfo_get_cpu_count(OMRCpuSet *cpuset) {
+#if defined(LINUX) && !defined(OMRZTPF)
+   if (cpuset->dynamic) {
+       CPU_COUNT(cpuset->set.d_cpuset.cpuset);
+   } else {
+       CPU_COUNT(&cpuset->set.cpuset);
+   }
+#endif
+}
-- 

Whether these APIs would be useful for OMR or should they be in OpenJ9, or just be internal functions, is open to discussion. With the above functions, the code for CPU governors can be simplified a lot like this:

+typedef struct J9CpuGovernor {
+   char *type;
+   struct OMRCpuSet cpuSet;
+   struct J9CpuGovernor *next;
+} J9CpuGovernor;
+
+
+#if defined(LINUX)
+
+#define CPU_INDEX_LENGTH 5
+
+char const *cpuGovernorPathPattern = "/sys/devices/system/cpu/cpu%d/cpufreq/scaling_governor";
+#define CPU_GOVERNOR_PATTERN_SIZE (strlen(cpuGovernorPathPattern) + (CPU_INDEX_LENGTH) + 1)
+
+static void
+addCpuToGovernorList(struct J9PortLibrary *portLibrary, J9CpuGovernor **governorList, int cpu, const char *governor)
+{
+   int i = 0;
+   J9CpuGovernor *governorEntry = *governorList;
+   J9CpuGovernor *prevEntry = NULL;
+
+   while (governorEntry != NULL) {
+       if (0 == strcmp(governor, governorEntry->type)) {
+           break;
+       }
+       prevEntry = governorEntry;
+       governorEntry = governorEntry->next;
+   }
+
+   if (governorEntry == NULL) {
+       governorEntry = omrmem_allocate_memory(sizeof(J9CpuGovernor) + strlen(governor) + 1);
+       governorEntry->type = (char *)(governorEntry + 1);
+       strcpy(governorEntry->type, governor);
+       if (prevEntry) {
+           prevEntry->next = governorEntry;
+       }
+   }
+
+   omrsysinfo_add_cpuset(&governorEntry.cpuSet, cpu);
+
+   if (NULL == *governorList) {
+       *governorList = governorEntry;
+   }
+}
+
+static int32_t
+getCpuGovernorDetails(struct J9PortLibrary *portLibrary, struct J9CpuGovernor **governorList, OMRCpuSet *cpuSet)
+{
+   OMRPORT_ACCESS_FROM_J9PORT(portLibrary);
+   int32_t rc = 0;
+   int32_t i = 0;
+   int32_t cpuCount = omrsysinfo_get_cpu_count(cpuSet);
+   int32_t cpuIndex = 0;
+
+   for(i = 0; i < cpuCount; i++) {
+       char pathBuffer[CPU_GOVERNOR_PATTERN_SIZE];
+       int32_t cpuGovernorPathLength = 0;
+       FILE *file = NULL; 
+       char governorName[20];
+
+       while (!omrsysinfo_isset_cpuset(cpuSet, cpuIndex)) {
+           cpuIndex += 1;
+       }
+
+       cpuGovernorPathLength = omrstr_printf(pathBuffer, sizeof(pathBuffer), cpuGovernorPathPattern, i);
+       if (0 == cpuGovernorPathLength) {
+           rc = J9PORT_ERROR_STRING_ILLEGAL_STRING;
+           goto _end;
+       }
+       file = fopen(pathBuffer, "r");
+       if (NULL == file) {
+           rc = J9PORT_ERROR_FILE_OPFAILED;
+           goto _end;
+       }
+       if (NULL == fgets(governorName, 20, file)) {
+           rc = J9PORT_ERROR_FILE_OPFAILED;
+       } else {
+           addCpuToGovernorList(cpuIndex, governorName, governorList);
+       }
+       fclose(file);
+       cpuIndex += 1;
+   }
+_end:
+   return rc;
+}
+#endif
+
+int32_t
+j9sysinfo_get_cpu_governor_info(struct J9PortLibrary *portLibrary, J9CpuGovernor **governorList)
+{
+   int32_t rc = J9PORT_ERROR_SYSINFO_NOT_SUPPORTED;
+   OMRCpuSet cpuSet = {0};
+   omrsysinfo_get_cpu_affinity(&cpuSet);
+   getCpuGovernorDetails(portLibrary, &cpuSet, governorList);
+}
+
+j9sysinfo_release_cpu_governor_info(struct J9PortLibrary *portLibrary, J9CpuGovernor *governorList)
+{
+
+}

This code does not need to have a hard coded list of governors. Note that this is not working code and needs to be tested and polished like adding return values, handling error conditions etc.

pshipton commented 5 years ago

Moving forward until this picks up again.