Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent
http://azure.microsoft.com/
Apache License 2.0
535 stars 371 forks source link

[BUG] CGroupsException occurs when trying to read 'cpuacct.stat' with cgroupsv2 on 2.9.0.4 #2815

Open cameronmeissner opened 1 year ago

cameronmeissner commented 1 year ago

Describe the bug: A clear and concise description of what the bug is. It seems that a recent regression was introduced that prevents the agent from properly loading cgroup information when running on cgroupv2 systems, such as Ubuntu 22.04. This regression is preventing us from using the agent's log-collection utility.

Relevant errors/logs: azureuser@camtest:~$ cat /var/log/waagent.log | grep cgroup 2023-05-03T23:38:09.604535Z INFO Daemon CGroups Status: The cgroup filesystem is ready to use 2023-05-03T23:38:09.613038Z WARNING Daemon Failed to create a cgroup for the VM Agent; resource usage for the Agent will not be tracked. Error: [CGroupsException] Failed to get paths of agent's cgroups. Error: join() argument must be str, bytes, or os.PathLike object, not 'NoneType' 2023-05-03T23:38:11.601529Z INFO ExtHandler CGroups Status: The cgroup filesystem is ready to use 2023-05-03T23:38:11.605436Z WARNING ExtHandler Failed to create a cgroup for the VM Agent; resource usage for the Agent will not be tracked. Error: [CGroupsException] Failed to get paths of agent's cgroups. Error: join() argument must be str, bytes, or os.PathLike object, not 'NoneType' 2023-05-03T23:38:12.946199Z INFO ExtHandler ExtHandler [CGW] The CPU cgroup controller is not mounted 2023-05-03T23:38:12.946657Z INFO ExtHandler ExtHandler [CGW] The memory cgroup controller is not mounted 2023-05-03T23:38:12.948990Z INFO ExtHandler ExtHandler [CGI] cgroups v2 mounted at /sys/fs/cgroup. Controllers: [cpuset cpu io memory hugetlb pids rdma misc 2023-05-03T23:38:12.949703Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a CPU cgroup 2023-05-03T23:38:12.950125Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a memory cgroup 2023-05-03T23:38:12.950521Z INFO ExtHandler ExtHandler [CGI] Agent cgroups enabled: False 2023-05-03T23:38:12.991837Z INFO ExtHandler ExtHandler Checking if log collection is allowed at this time [False]. All three conditions must be met: configuration enabled [True], cgroups enabled [False], python supported: [True] cpu_slice_matches = (cgroupconfigurator.LOGCOLLECTOR_SLICE in cpu_cgroup_path) 2023-05-04T16:04:12.279036Z INFO Daemon CGroups Status: The cgroup filesystem is ready to use 2023-05-04T16:04:12.286183Z WARNING Daemon Failed to create a cgroup for the VM Agent; resource usage for the Agent will not be tracked. Error: [CGroupsException] Failed to get paths of agent's cgroups. Error: join() argument must be str, bytes, or os.PathLike object, not 'NoneType' 2023-05-04T16:04:13.835128Z INFO ExtHandler ExtHandler [CGW] The CPU cgroup controller is not mounted 2023-05-04T16:04:13.837237Z INFO ExtHandler ExtHandler [CGW] The memory cgroup controller is not mounted 2023-05-04T16:04:13.843040Z INFO ExtHandler ExtHandler [CGI] cgroups v2 mounted at /sys/fs/cgroup. Controllers: [cpuset cpu io memory hugetlb pids rdma misc 2023-05-04T16:04:13.849856Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a CPU cgroup 2023-05-04T16:04:13.851763Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a memory cgroup 2023-05-04T16:04:13.853358Z INFO ExtHandler ExtHandler [CGI] Agent cgroups enabled: False 2023-05-04T16:04:13.947264Z INFO ExtHandler ExtHandler Checking if log collection is allowed at this time [False]. All three conditions must be met: configuration enabled [True], cgroups enabled [False], python supported: [True]

root@camtest:/home/azureuser# python3 /var/lib/waagent/WALinuxAgent-2.9.0.4/bin/WALinuxAgent-2.9.0.4-py2.7.egg -verbose -collect-logs -full 2023-05-04T17:32:30.806097Z INFO MainThread LogCollector Running log collector mode full 2023-05-04T17:32:30.807573Z INFO MainThread LogCollector WireServer endpoint 168.63.129.16 read from file 2023-05-04T17:32:30.807691Z INFO MainThread LogCollector Wire server endpoint:168.63.129.16 2023-05-04T17:32:30.807808Z INFO MainThread LogCollector Forcing an update of the goal state. 2023-05-04T17:32:30.808047Z VERBOSE MainThread LogCollector HTTP connection [GET] [/machine/?comp=goalstate] [None] [{'x-ms-agent-name': 'WALinuxAgent', 'x-ms-version': '2012-11-30', 'Connection': 'close', 'User-Agent': 'WALinuxAgent/2.9.0.4'}] 2023-05-04T17:32:30.811276Z VERBOSE MainThread LogCollector [HTTP Response] Status Code 200 2023-05-04T17:32:30.812795Z INFO MainThread Fetched a new incarnation for the WireServer goal state [incarnation 1] 2023-05-04T17:32:30.813410Z VERBOSE MainThread LogCollector HTTP connection [GET] [/machine/?comp=goalstate] [None] [{'x-ms-agent-name': 'WALinuxAgent', 'x-ms-version': '2012-11-30', 'Connection': 'close', 'User-Agent': 'WALinuxAgent/2.9.0.4'}] 2023-05-04T17:32:30.814870Z VERBOSE MainThread LogCollector [HTTP Response] Status Code 200 2023-05-04T17:32:30.815446Z VERBOSE MainThread LogCollector HTTP connection [GET] [/vmSettings] [None] [{'x-ms-version': '2015-09-01', 'x-ms-containerid': 'fa934f30-8b67-400f-844d-6b2941265f85', 'x-ms-host-config-name': 'db66d74b-6e67-4c35-b570-6b8a09a442a6.0.db66d74b-6e67-4c35-b570-6b8a09a442a6.0._camtest.1.xml', 'x-ms-client-correlationid': 'f57db5 2e-aea6-4c1c-94ae-4cbdd1c4083a', 'Connection': 'close', 'User-Agent': 'WALinuxAgent/2.9.0.4'}] 2023-05-04T17:32:30.816906Z VERBOSE MainThread LogCollector [HTTP Response] Status Code 200 2023-05-04T17:32:30.817453Z INFO MainThread LogCollector HostGAPlugin version: 1.0.8.139 2023-05-04T17:32:30.818028Z INFO MainThread 2023-05-04T17:32:30.818137Z INFO MainThread Fetched new vmSettings [HostGAPlugin correlation ID: f57db52e-aea6-4c1c-94ae-4cbdd1c4083a eTag: 2395950536767062532 source: Fabric] 2023-05-04T17:32:30.818468Z INFO MainThread The vmSettings originated via Fabric; will ignore them. 2023-05-04T17:32:30.819084Z INFO MainThread 2023-05-04T17:32:30.819176Z INFO MainThread Fetching full goal state from the WireServer [incarnation 1] 2023-05-04T17:32:30.819654Z VERBOSE MainThread LogCollector HTTP connection [GET] [/machine/fa934f30-8b67-400f-844d-6b2941265f85/db66d74b%2D6e67%2D4c35%2Db570%2D6b8a09a442a6.%5Fcamtest?comp=config&type=extensionsConfig&incarnation=1] [None] [{'x-ms-agent-name': 'WALinuxAgent', 'x-ms-version': '2012-11-30', 'Connection': 'close', 'User-Agent': 'WALinu xAgent/2.9.0.4'}] 2023-05-04T17:32:30.821543Z VERBOSE MainThread LogCollector [HTTP Response] Status Code 200 2023-05-04T17:32:30.822538Z VERBOSE MainThread LogCollector Extension config shows status blob type as [PageBlob] 2023-05-04T17:32:30.822712Z VERBOSE MainThread LogCollector Downloading artifacts profile blob 2023-05-04T17:32:30.822831Z VERBOSE MainThread LogCollector Fetch [https://md-k30bv1003kcd.z19.blob.storage.azure.net/$system/camtest.f5a7b046-fddb-48bf-b8f5-7aa54ff9dcde.vmSettings?sv=2018-03-28&sr=b&sk=system-1&sig=hW4vEvhB9Bm2hphDUHxE2dzNnq2YotrRm4c7pE%2bWwzk%3d&se=9999-01-01T00%3a00%3a00Z&sp=r] with headers [None] 2023-05-04T17:32:30.824366Z VERBOSE MainThread LogCollector HTTP connection [GET] [/$system/camtest.f5a7b046-fddb-48bf-b8f5-7aa54ff9dcde.vmSettings?sv=2018-03-28&sr=b&sk=system-1&sig=hW4vEvhB9Bm2hphDUHxE2dzNnq2YotrRm4c7pE%2bWwzk%3d&se=9999-01-01T00%3a00%3a00Z&sp=r] [None] [{'Connection': 'close', 'User-Agent': 'WALinuxAgent/2.9.0.4'}] 2023-05-04T17:32:30.845726Z VERBOSE MainThread LogCollector [HTTP Response] Status Code 200 2023-05-04T17:32:30.846545Z VERBOSE MainThread LogCollector HTTP connection [GET] [/machine/fa934f30-8b67-400f-844d-6b2941265f85/db66d74b%2D6e67%2D4c35%2Db570%2D6b8a09a442a6.%5Fcamtest?comp=config&type=hostingEnvironmentConfig&incarnation=1] [None] [{'x-ms-agent-name': 'WALinuxAgent', 'x-ms-version': '2012-11-30', 'Connection': 'close', 'User-Agent': 'WALinuxAgent/2.9.0.4'}] 2023-05-04T17:32:30.849199Z VERBOSE MainThread LogCollector [HTTP Response] Status Code 200 2023-05-04T17:32:30.849906Z VERBOSE MainThread LogCollector HTTP connection [GET] [/machine/fa934f30-8b67-400f-844d-6b2941265f85/db66d74b%2D6e67%2D4c35%2Db570%2D6b8a09a442a6.%5Fcamtest?comp=config&type=sharedConfig&incarnation=1] [None] [{'x-ms-agent-name': 'WALinuxAgent', 'x-ms-version': '2012-11-30', 'Connection': 'close', 'User-Agent': 'WALinuxAge nt/2.9.0.4'}] 2023-05-04T17:32:30.851765Z VERBOSE MainThread LogCollector [HTTP Response] Status Code 200 2023-05-04T17:32:30.852283Z VERBOSE MainThread LogCollector HTTP connection [GET] [/machine/fa934f30-8b67-400f-844d-6b2941265f85/db66d74b%2D6e67%2D4c35%2Db570%2D6b8a09a442a6.%5Fcamtest?comp=certificates&incarnation=1] [None] [{'x-ms-agent-name': 'WALinuxAgent', 'x-ms-version': '2012-11-30', 'x-ms-cipher-name': 'DES_EDE3_CBC', 'x-ms-guest-agent-public -x509-cert': 'MIIDEzCCAfugAwIBAgIUfMbOVIjV56Sxwf8K2L3LZgy1HhkwDQYJKoZIhvcNAQELBQAwGTEXMBUGA1UEAwwOTGludXhUcmFuc3BvcnQwHhcNMjMwNTA0MTYwNDEzWhcNMjUwNTAzMTYwNDEzWjAZMRcwFQYDVQQDDA5MaW51eFRyYW5zcG9ydDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBALRps4Va0RnHRneo+7zCGZ8RkmgmWxbbjRG/Fhoq5Lw7nnuw1gZ1yJPuqVM7dctxA1uXbtbyyOIKwhFXwtul0gOJQdadfW2sWJOO96vhQtavQ9bPue fYIKHxdCcWtfylXrAcGoB2olKb2qAFjdABalImno/ht5ZcY09lySFKN6xPO9meOd0jIeVR71Oy3sX2HWBUAHqDQY3/1+aF3/H0LlNKdTkIXVQgmbvBr9aUTjs8q5ewdAGCAM6s/DcFj4hTAW9A7UtQngKVVgWFtHsAUDyMLip+2yrJAqWZ0MmJf+QRMY1Baiv3YK2EjbWCobnBCdwTju6zRMfvZ9XgLurAFhsCAwEAAaNTMFEwHQYDVR0OBBYEFB8cqNkf5nUI6COjC/aYdGX35Lp5MB8GA1UdIwQYMBaAFB8cqNkf5nUI6COjC/aYdGX35Lp5MA8GA1UdEwEB/wQFMAMBAf8wDQ YJKoZIhvcNAQELBQADggEBAFYxtCFZ1oo/+DHnz8zmXuQN6nr6cheaGrHJBV1+EG08h5E3pxuMuXGsWnqNkmjlnRrv2RYc+KYHLU9FtXaTMdgs9oBSNkfam4tCZNkl/STVIPMhBLN757cC4JsY5GwrJ1qwApMiyATxd3rF8GcQFNgpmwoK26y8nf1Ppj1vrxPJJwco/RC26X97na2o2rsNc+Ojw7//Z2uq6qqJrqEjiLMY3qQwjYb5ZL6IDe2ebY2dVEViPhzNDcgIwcI7QFzOQ8OmztiUHnj/Olt5q/oc9nkSvTfynZfxIKXJy1piGIVmy3TIvyWUTD0FGMEZPlvjBrXe2iMFoO h45nrn6fAn15c=', 'Connection': 'close', 'User-Agent': 'WALinuxAgent/2.9.0.4'}] 2023-05-04T17:32:30.861509Z VERBOSE MainThread LogCollector [HTTP Response] Status Code 200 2023-05-04T17:32:30.894279Z INFO MainThread Downloaded certificate {'thumbprint': '61426980F1FBB23C7A1FA96EF757C805E19E6434', 'hasPrivateKey': True} 2023-05-04T17:32:30.894797Z INFO MainThread Fetch goal state completed 2023-05-04T17:32:30.895565Z VERBOSE MainThread LogCollector HTTP connection [GET] [/metadata/instance/compute?api-version=2018-02-01] [None] [{'User-Agent': 'WALinuxAgent/2.9.0.4', 'Metadata': True, 'Connection': 'close'}] 2023-05-04T17:32:30.905314Z VERBOSE MainThread LogCollector [HTTP Response] Status Code 200 2023-05-04T17:32:30.911593Z ERROR MainThread LogCollector Log collection completed unsuccessfully. Error: [CGroupsException] Failed to read cpuacct.stat: expected str, bytes or os.PathLike object, not NoneType 2023-05-04T17:32:30.911857Z INFO MainThread LogCollector Detailed log output can be found at /var/lib/waagent/logcollector/results.txt

Repro steps Simply spawn the agent on an Ubuntu 22.04 VM. Further, try manually invoking the log-collection utility with: python3 /var/lib/waagent/WALinuxAgent-2.9.0.4/bin/WALinuxAgent-2.9.0.4-py2.7.egg -verbose -collect-logs -full and observe the error output.

Version Info WALinuxAgent-2.2.46 running on ubuntu 22.04 Python: 3.10.6 Goal state agent: 2.9.0.4

Additional context We think this regression could have something to do with: https://github.com/Azure/WALinuxAgent/pull/2783.

We also found that the specific cause right now seems to be that the agent attempts to read the file cpuacct.stat (which is a cgroupv1 thing), when it should be trying to read from cpu.stat on cgroupv2.

narrieta commented 1 year ago

@cameronmeissner Yes, currently the agent uses cgroups v1 and the log collector has a dependency on that. We'll add support for v2 at a later point, but at this moment there is not a timeline for the implementation.

AlftioH commented 1 year ago

@narrieta Is this the pull request this bug https://github.com/Azure/WALinuxAgent/pull/2866/files #2866 ?

nagworld9 commented 1 year ago

@AlftioH No, This PR is different. Migrating existing test scenarios to new automation. As part of that, I'm doing it for agent cgroups which only support v1 as of now.

AlftioH commented 1 year ago

@nagworld9 Thanks for the clarifcaiton. All clear and waiting for the new PR + ER that can help with the situations on Ubuntu 22 and cgroups v2 😎