Closed fmuyassarov closed 1 month ago
@askervin @klihub PTAL. Note that I didn't add a commit for the docs assuming that there might be some changes with the naming or something else. What you guys are happy with the change I will add a commit to update the documentation on this same PR.
LGTM to me. I skimmed through this once, and if I got the naming wrt. config hierarchy right, this looks fine to me.
My only question is that since ATM we can't test this functionality in our e2e test VMs, have you tested this on real HW in practice ?
Yes, I have tested it on my own machine and it works as expected. Fro example, I was checking the updated governor mode of a core belonging to a balloon by simply
cat /sys/devices/system/cpu/cpuN/cpufreq/scaling_governor
I really wish we had some way to test this in our e2e tests, too. For instance, if we have idleCPUClass defined and no balloons (except for the reserved an possibly the default) to start with, we should have CPUs configured with idle class frequencies and governor.
@fmuyassarov, if this runs in a vm, do we get log error messages about failing to set a governor for a CPU? From those errors we could verify that we at least tried to do something sensible.
Another reason why this should have an e2e test is that we wish to see what happens when someone uses a balloons configuration with a governor in a vm. After all, it's very much possible that the policy runs in a vm, and might (possibly accidentally) have CPU classes defined, too.
I really wish we had some way to test this in our e2e tests, too. For instance, if we have idleCPUClass defined and no balloons (except for the reserved an possibly the default) to start with, we should have CPUs configured with idle class frequencies and governor.
@fmuyassarov, if this runs in a vm, do we get log error messages about failing to set a governor for a CPU? From those errors we could verify that we at least tried to do something sensible.
Another reason why this should have an e2e test is that we wish to see what happens when someone uses a balloons configuration with a governor in a vm. After all, it's very much possible that the policy runs in a vm, and might (possibly accidentally) have CPU classes defined, too.
One possibility is to use similar plumbing than we use for testing controller-agnostic hook triggers in balloons/n4c16/test21-controller-check
.
That would require a few changes to the cpu controller when ENABLE_TEST_APIS
is set in the environment:
cpuctl.testing
when ENABLE_TEST_APIS
is setenforce{uncore,Cpufreq,CpufreqGovernor}
/cpu-control-state
HTTP endpoint for querying controller stateIn the handler for the HTTP endpoint you dump something like this as JSON
type state struct {
Classes map[string]Class
Assignments map[string]idset.ID // might be easier to have map[string]string instead
}
Then in the tests, we use this endpoint just like we now do for querying metrics, but we fetch data from /cpu-control-state
instead of /metrics
. Since the exposed data is JSON in a known format controlled by us, we can verify the expected test result with jq
queries.
@fmuyassarov, if this runs in a vm, do we get log error messages about failing to set a governor for a CPU? From those errors we could verify that we at least tried to do something sensible.
I can verify it. My previous tests were on physical machine.
Then in the tests, we use this endpoint just like we now do for querying metrics, but we fetch data from
/cpu-control-state
instead of/metrics
. Since the exposed data is JSON in a known format controlled by us, we can verify the expected test result withjq
queries.
As we discussed offline, let's keep it in mind and I will try to allocate some time for setting up something similar that you described here.
This commit introduces a new field in the balloons configuration CR, enabling users to specify CPU frequency governor mode preferences.