Open robinvanderstraeten-klarrio opened 1 year ago
Thanks for reporting this, @robinvanderstraeten-klarrio! We've seen this behavior internally but didn't get the chance to create a dedicated GitHub issue
Reading through the Cruise Control issue, it seems that simply removing the cruise.control.metrics.reporter.kubernetes.mode
would fix this, but I'm not too knowledgeable about Cruise Control in general and the impact that this would have on a production deployment.
If this would be a good solution, I'd be happy to contribute it.
I don't think we should remove the cruise.control.metrics.reporter.kubernetes.mode
configuration, this configuration was added to resolve CPU utilization reporting issue, see https://github.com/banzaicloud/koperator/issues/463
Perhaps the best way is to wait for upstream CC to fix their issue with cgroups v2 so we can adapt in Koperator
Description
Cruise control currently does not support running on a cluster with cgroup v2 when the configuration
cruise.control.metrics.reporter.kubernetes.mode
is set to true. (see https://github.com/linkedin/cruise-control/issues/1873) Koperator always sets this to true (https://github.com/banzaicloud/koperator/blob/v0.25.1/pkg/resources/kafka/configmap.go#L105) and AFAIK, there is currently no way to override this configuration.Expected Behavior
The Cruise Control metrics collector should collect and publish metrics about the Kafka brokers.
Actual Behavior
The Cruise Control metrics collector crashes. The following appears once per minute in the logs of every broker:
This also has a side effect: Cruise Control doesn't seem to be able to deal with the fact that it is not getting these metrics. It's memory usage grows until it is eventually OOM killed.
Affected Version
Seen on version 0.24.1. Though this will be a problem on all versions where
cruise.control.metrics.reporter.kubernetes.mode
gets set to true.Steps to Reproduce
Checklist