Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent
http://azure.microsoft.com/
Apache License 2.0
542 stars 372 forks source link

[BUG] waagent -collect-logs doesn't work in RHEL-9/10(WALA 2.9.1.1) and the log is confusing #3156

Closed yuxisun1217 closed 4 months ago

yuxisun1217 commented 4 months ago

Describe the bug: A clear and concise description of what the bug is. In RHEL-10 the WALA CGroups feature cannot be enabled, so that the log collection is not allowed. This might be expected because of #2637 . But the error log is confusing when running "waagent -collect-logs":

# waagent -collect-logs
...
2024-07-09T03:07:55.659321Z ERROR MainThread LogCollector Log collection completed unsuccessfully. Error: [CGroupsException] Failed to read cpuacct.stat: expected str, bytes or os.PathLike object, not NoneType
...

Note: Please add some context which would help us understand the problem better

  1. In RHEL-9 or 10 with WALA v2.9.1.1 installed
  2. Run "waagent -collect-logs"

Distro and WALinuxAgent details (please complete the following information):

Additional context

Log file attached

# waagent -collect-logs
2024-07-09T03:07:55.629123Z INFO MainThread LogCollector Running log collector mode normal
2024-07-09T03:07:55.630477Z INFO MainThread LogCollector WireServer endpoint 168.63.129.16 read from file
2024-07-09T03:07:55.630736Z INFO MainThread LogCollector Wire server endpoint:168.63.129.16
2024-07-09T03:07:55.630962Z INFO MainThread LogCollector Forcing an update of the goal state.
2024-07-09T03:07:55.640532Z INFO MainThread Fetched a new incarnation for the WireServer goal state [incarnation 1]
2024-07-09T03:07:55.641873Z INFO MainThread 
2024-07-09T03:07:55.642163Z INFO MainThread Fetching full goal state from the WireServer [incarnation 1]
2024-07-09T03:07:55.646676Z INFO MainThread Fetch goal state completed
2024-07-09T03:07:55.659321Z ERROR MainThread LogCollector Log collection completed unsuccessfully. Error: [CGroupsException] Failed to read cpuacct.stat: expected str, bytes or os.PathLike object, not NoneType
2024-07-09T03:07:55.659552Z INFO MainThread LogCollector Detailed log output can be found at /var/lib/waagent/logcollector/results.txt
maddieford commented 4 months ago

@yuxisun1217 This is an issue in v2.9.1.1 of the agent (#2929). We added resource monitoring on collect-logs in v2.9.1.1 which broke the command line option.

The fix for this was released in versions 2.10.0.8+

maddieford commented 4 months ago

This issue was fixed in 2.10.0.8+

ani-sinha commented 3 months ago

This issue was fixed in 2.10.0.8+

@maddieford Can you point us to the specific set of patches? Maybe we can try to backport them.

ani-sinha commented 3 months ago

2929

This issue was fixed in 2.10.0.8+

@maddieford Can you point us to the specific set of patches? Maybe we can try to backport them.

Is it this one? https://github.com/Azure/WALinuxAgent/pull/2939

maddieford commented 3 months ago

This is the PR which fixed issue #2929:

https://github.com/Azure/WALinuxAgent/pull/2939