Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent
http://azure.microsoft.com/
Apache License 2.0
542 stars 372 forks source link

[BUG] Cgroup monitoring doesn't support RHEL-9 #2618

Open yuxisun1217 opened 2 years ago

yuxisun1217 commented 2 years ago

Describe the bug: A clear and concise description of what the bug is. In RHEL-9, the logs collector cannot be enabled because cgroup monitoring cannot be enabled. It seems there're 2 reasons:

  1. Cgroup monitoring is not supported on ['rhel', '9.1', 'Plow', 'Red Hat Enterprise Linux']. The distro_name is 'rhel' but not 'redhat' so that cannot match the condition in the following function:
    class CGroupsApi(object):
    ...
        return ((distro_name.lower() == 'ubuntu' and distro_version.major >= 16) or
                (**distro_name.lower() in ("centos", "redhat")** and
                 ((distro_version.major == 7 and distro_version.minor >= 8) or distro_version.major >= 8)))
  2. In RHEL-9 it uses cgroup2. And WALA failed to find cpu and memory path.
    2022-06-20T10:02:13.301737Z INFO ExtHandler ExtHandler [CGW] The CPU cgroup controller is not mounted
    2022-06-20T10:02:13.304658Z INFO ExtHandler ExtHandler [CGW] The memory cgroup controller is not mounted
    2022-06-20T10:02:13.309621Z INFO ExtHandler ExtHandler [CGI] cgroups v2 mounted at /sys/fs/cgroup.  Controllers: [cpuset cpu io memory hugetlb pids rdma misc
    ]
    2022-06-20T10:02:13.310793Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a CPU cgroup
    2022-06-20T10:02:13.311341Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a memory cgroup
    2022-06-20T10:02:13.311828Z INFO ExtHandler ExtHandler [CGI] Agent cgroups enabled: False

Distro and WALinuxAgent details (please complete the following information):

nagworld9 commented 2 years ago

@yuxisun1217 Do you happen to know why cgroup v1 support was removed from these distros? Is this specific to rhel images or anything after particular Kernel version don't have cgroup v1?

nagworld9 commented 2 years ago

@yuxisun1217 since you pointed out distro name change, we wonder how the Agent packaged into RHEL image when agent setup required a change to copy agent unit, config and etc. files. Don't you use agent setup when packaging? or Do you guys modify/customize the agent to have necessary files. If so, we would like to have those changes in the upstream when you modify the agent for building base image. so that it won't break for customers who build it from here(source).

yuxisun1217 commented 2 years ago

Hi @nagworld9 , Sorry I don't have knowledge base for other distros...Start from RHEL-9.0, by default, mounts and utilizes cgroups-v2. I think it doesn't mean we don't support cgroup-v1, but the default one is cgroups-v2.(see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/managing_monitoring_and_updating_the_kernel/setting-limits-for-applications_managing-monitoring-and-updating-the-kernel)

yuxisun1217 commented 2 years ago

@nagworld9 About the distro name change, perhaps it's because in RHEL-9.0 the python version is 3.9, and the "linux_distribution" function is removed from platform. So it calls the distro.linux_distribution function to get the distro name. And here is the distro id list:

def id():
    """
    Return the distro ID of the current distribution, as a
    machine-readable string.

    For a number of OS distributions, the returned distro ID value is
    *reliable*, in the sense that it is documented and that it does not change
    across releases of the distribution.

    This package maintains the following reliable distro ID values:

    ==============  =========================================
    Distro ID       Distribution
    ==============  =========================================
    "ubuntu"        Ubuntu
    "debian"        Debian
    "rhel"          RedHat Enterprise Linux
    "centos"        CentOS
    "fedora"        Fedora
    "sles"          SUSE Linux Enterprise Server
    "opensuse"      openSUSE
    "amazon"        Amazon Linux
    "arch"          Arch Linux
    "cloudlinux"    CloudLinux OS
    "exherbo"       Exherbo Linux
    "gentoo"        GenToo Linux
    "ibm_powerkvm"  IBM PowerKVM
    "kvmibm"        KVM for IBM z Systems
    "linuxmint"     Linux Mint
    "mageia"        Mageia
    "mandriva"      Mandriva Linux
    "parallels"     Parallels
    "pidora"        Pidora
    "raspbian"      Raspbian
    "oracle"        Oracle Linux (and Oracle Enterprise Linux)
    "scientific"    Scientific Linux
    "slackware"     Slackware
    "xenserver"     XenServer
    "openbsd"       OpenBSD
    "netbsd"        NetBSD
    "freebsd"       FreeBSD
    "midnightbsd"   MidnightBSD
    ==============  =========================================