flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
673 stars 28 forks source link

[tracking] cAdvisor issues with cgroup2 #591

Open jepio opened 2 years ago

jepio commented 2 years ago

Description

cAdvisor has issues with cgroup2.

Upstream issues:

Impact

Missing node metrics.

Additional information

Please add any information here that does not fit the above format.

Tahvok commented 2 years ago

I'm not sure if it's related to cgroups v2, I just couldn't verify if it is. But I don't see any container percpu metrics as well. I see only the total. So can you add this please to this issue, or should I open another one?

jepio commented 2 years ago

Check if there's an open issue with cadvisor. If there isn't one, then please open one directly upstream. Then we can link it from here for tracking purposes.

Tahvok commented 2 years ago

@jepio finally got into testing the issue. Opened the issue: google/cadvisor#3065

jepio commented 2 years ago

Thanks. I looked into this briefly and it appears that cgroup2 simply doesn't expose percpu metrics for CPU time, so there might not be a way to solve this.

Do you have any particular use case that benefits from having the stats percpu?

Tahvok commented 2 years ago

Unfortunately not all applications are multithreaded. If you have multiple containers running on a multi core host, and one of them is not multithreaded - what way would there be find if one of the cores is running at 100% - basically saying there is an issue? Example of a very known non multithreaded app is redis.

Tahvok commented 2 years ago

Another example is simply a bad written application that usually does work in a multithreaded way, but some particular method of it does not - so the whole application will hang due to this method if it hogs a single core. Having percpu metrics is important to monitor for such cases.