Azure / azure-linux-extensions

Linux Virtual Machine Extensions for Azure
Apache License 2.0
309 stars 255 forks source link

Stale configchunks make the agent fail #1995

Open jantekb opened 2 weeks ago

jantekb commented 2 weeks ago

If one creates a VM image from a machine that had AMA installed (1.33.1) with a particular data collection rule for sending text logs to Log Analytics,

then the resulting new instances from the source image will silently fail to publish logs going forward. This is due to stale config chunks sitting in /etc/opt/microsoft/azuremonitoragent/config-cache/configchunks which have tokenEndpointUri that contains the name of the source VM image.

None of the log files under /var/opt/microsoft/azuremonitoragent/log indicate this error, unless the repeated, but cryptic:

2024-11-11T13:50:30.1561510Z: [/__w/1/s/external/WindowsAgent/src/shared/mcsmanager/lib/src/RefreshGigToken.cpp:243,RefreshGigToken]
2024-11-11T13:50:30.1767310Z: [/__w/1/s/external/WindowsAgent/src/shared/mcsmanager/lib/src/RefreshGigToken.cpp:243,RefreshGigToken]
2024-11-11T13:51:30.2037210Z: [/__w/1/s/external/WindowsAgent/src/shared/mcsmanager/lib/src/RefreshGigToken.cpp:243,RefreshGigToken]

messages are connected to this problem (I don't know, these log lines are not very meaningful to me).

I would have expected that AMA over time refreshes the config-cache or cleans it up upon reboot, but apparently the only way to work this around now is 1) stopping the service 2) manually cleaning the config chunk json files 3) restarting the service.

I think AMA should be able to recover from this scenario on it's own.