Open KuSh opened 1 year ago
@KuSh 2.2.46 is a very old Agent. Current versions should not have this issue. Is agent autoupdate disabled on your VM?
@KuSh 2.2.46 is a very old Agent. Current versions should not have this issue. Is agent autoupdate disabled on your VM?
Not that I'm aware of. I'm using standard 22_04-lts-arm64 plan with unattended-upgrades enabled and didn't configure agent
I did found previous similar issue but since it was marked fixed since 2021 and the VM was created in march this year I didn't thought to check which agent version included the fix
So it seems to be the package provided by jammy :
$ apt-cache policy walinuxagent
walinuxagent:
Installed: 2.2.46-0ubuntu5.1
Candidate: 2.2.46-0ubuntu5.1
Version table:
*** 2.2.46-0ubuntu5.1 500
500 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main arm64 Packages
100 /var/lib/dpkg/status
2.2.46-0ubuntu5 500
500 http://ports.ubuntu.com/ubuntu-ports jammy/main arm64 Packages
And your advice is that "Installation via your distribution's package repository is preferred." So what can we do ?
I'll try to open a ticket on ubuntu side
I've checked another vm that is identical except that it is amd64 based and the autoupdate did work. On this arm based vm, it sticks to 2.2.46. restarting nor reinstalling didn't fix the problem. I ended up installing it via git to resolve the issue
AutoUpdate is enabled
$ waagent -show-configuration 2>/dev/null | grep -i autoupdate
AutoUpdate.Enabled = True
AutoUpdate.GAFamily = Prod
Autoupdate.Frequency = 3600
@KuSh What region is this VM in? We're currently rolling out an arm64 agent which has the fix for this. See deployment timeline here: https://github.com/Azure/WALinuxAgent/releases/tag/pre-v2.9.1.1
@KuSh What region is this VM in? We're currently rolling out an arm64 agent which has the fix for this. See deployment timeline here: https://github.com/Azure/WALinuxAgent/releases/tag/pre-v2.9.1.1
France-central
We published 2.9.1.1 to france central yesterday. Could you please confirm you have autoupdate enabled and your machine takes 2.9.1.1.
Seems like we are seeing the same exact issue, in some VMs in China East 2 region. We're running the same versions of Ubuntu, walinuxagent as above. From my limited testing, the trigger for this was me trying to run a managed run-command using this:
> az vm run-command create --resource-group RESOURCE-GROUP --vm-name VM-NAME --async-execution false --name keklinke-test-cmd --script "whoami"
(VMExtensionProvisioningError) The requested operation requires features that are not supported by the version of the VM agent running in the VM. Unsupported features: 'https://aka.ms/VMExtensionLinuxAgentUpdate'.
More information on troubleshooting is available at MultipleExtensionsPerHandler
Code: VMExtensionProvisioningError
Message: The requested operation requires features that are not supported by the version of the VM agent running in the VM. Unsupported features: 'https://aka.ms/VMExtensionLinuxAgentUpdate'.
More information on troubleshooting is available at MultipleExtensionsPerHandler
It gives an error about using an old agent version, which makes sense since we're running 2.2.46. But after that fails, the agent is no longer responding or ready. /var/log/waagent.log shows this (repeating over and over):
2023-06-08T17:07:14.850357Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:14.936695Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:15.015115Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:15.025121Z WARNING ExtHandler Exception retrieving extension handlers: [ProtocolError] Exceeded max retry updating goal state
2023-06-08T17:07:15.030412Z ERROR ExtHandler Event: name=WALinuxAgent, op=ExtensionProcessing, message=Exception retrieving extension handlers: [ProtocolError] Exceeded max retry updating goal state [<FrameSummary file /usr/lib/python3/dist-packages/azurelinuxagent/ga/exthandlers.py, line 230 in run>, <FrameSummary file /usr/lib/python3/dist-packages/azurelinuxagent/common/protocol/wire.py, line 151 in get_ext_handlers>, <FrameSummary file /usr/lib/python3/dist-packages/azurelinuxagent/common/protocol/wire.py, line 781 in update_goal_state>, <FrameSummary file /usr/lib/python3/dist-packages/azurelinuxagent/common/protocol/wire.py, line 845 in _update_from_goal_state>], duration=0
After that happened, I tried restarting the service by running sudo service walinuxagent restart
, and it starts failing with a different error and never really initializes properly:
2023-06-08T17:07:24.880900Z INFO Daemon Agent WALinuxAgent-2.2.46 forwarding signal 15 to WALinuxAgent-2.2.46
2023-06-08T17:07:25.109711Z INFO Daemon Azure Linux Agent Version:2.2.46
2023-06-08T17:07:25.111855Z INFO Daemon OS: ubuntu 22.04
2023-06-08T17:07:25.113736Z INFO Daemon Python: 3.10.6
2023-06-08T17:07:25.114657Z INFO Daemon CGroups Status: The cgroup filesystem is ready to use
2023-06-08T17:07:25.117407Z WARNING Daemon Failed to create a cgroup for the VM Agent; resource usage for the Agent will not be tracked. Error: [CGroupsException] Failed to get paths of agent's cgroups. Error: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'
2023-06-08T17:07:25.118744Z INFO Daemon Run daemon
2023-06-08T17:07:25.125028Z INFO Daemon cloud-init is enabled: True
2023-06-08T17:07:25.127181Z INFO Daemon Using cloud-init for provisioning
2023-06-08T17:07:25.128531Z INFO Daemon Clean protocol and wireserver endpoint
2023-06-08T17:07:25.129925Z INFO Daemon Provisioning already completed, skipping.
2023-06-08T17:07:25.130278Z INFO Daemon RDMA capabilities are not enabled, skipping
2023-06-08T17:07:25.136307Z INFO Daemon Installed Agent WALinuxAgent-2.2.46 is the most current agent
2023-06-08T17:07:25.302584Z INFO ExtHandler Agent WALinuxAgent-2.2.46 is running as the goal state agent
2023-06-08T17:07:25.305250Z INFO ExtHandler Distro info: ubuntu 22.04, osutil class being used: UbuntuOSUtil, agent service name: walinuxagent
2023-06-08T17:07:25.308589Z INFO ExtHandler Detect protocol endpoints
2023-06-08T17:07:25.309158Z INFO ExtHandler Clean protocol and wireserver endpoint
2023-06-08T17:07:25.310759Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-08T17:07:25.311769Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-08T17:07:25.312876Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-08T17:07:25.313779Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-08T17:07:27.332689Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-08T17:07:27.334967Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-08T17:07:27.336013Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-08T17:07:27.712377Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:27.793550Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:27.873378Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:27.882818Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-08T17:07:27.885082Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-08T17:07:27.890309Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-08T17:07:27.892297Z INFO ExtHandler Retry detect protocols: retry=0
2023-06-08T17:07:37.904213Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-08T17:07:37.906977Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-08T17:07:37.909119Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-08T17:07:37.910138Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-08T17:07:37.929077Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-08T17:07:37.931572Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-08T17:07:37.933522Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-08T17:07:38.331606Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:38.408878Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:38.486851Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:38.495784Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-08T17:07:38.496105Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-08T17:07:38.516567Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-08T17:07:38.519429Z INFO ExtHandler Retry detect protocols: retry=1
I confirmed that autoupdate is enabled, but it doesn't seem like it's working since we're still running the old version:
$ waagent -show-configuration 2>/dev/null | grep -i autoupdate
AutoUpdate.Enabled = True
AutoUpdate.GAFamily = Prod
Autoupdate.Frequency = 3600
And for reference, here is the version info:
$ apt-cache policy walinuxagent
walinuxagent:
Installed: 2.2.46-0ubuntu5.1
Candidate: 2.2.46-0ubuntu5.1
Version table:
*** 2.2.46-0ubuntu5.1 500
500 http://azure.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
100 /var/lib/dpkg/status
2.2.46-0ubuntu5 500
500 http://azure.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
And from here I'm stuck. Any suggestions? Do I need to reinstall the agent from github?
We published 2.9.1.1 to france central yesterday. Could you please confirm you have autoupdate enabled and your machine takes 2.9.1.1.
I did install latest release by cloning repo so I can't tell.
Quick update on my issue - I tried installing version 2.9.0.4 manually from GitHub and it seems to have worked. When will version 2.9.1.1 be released to China East 2? I can verify that ours auto-updates after that date.
@keklinke We don't have an exact date for when 2.9.1.1 will be released to China East 2. We're currently working through publishing issues there
For reference I've just deployed a new Standard_D2pls_v5 VM and the problem persists on arm64
$ waagent --version
/usr/sbin/waagent:27: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
import imp
WALinuxAgent-2.2.46 running on ubuntu 22.04
Python: 3.10.12
Goal state agent: 2.2.46
I've automated a switch to git latest version via cloud-init to avoid the problem:
### Upgrade waagent to latest version as ubuntu packaged version is subject to filling up entire filesystem, ref: https://github.com/Azure/WALinuxAgent/issues/2836
apt remove -y walinuxagent
git clone https://github.com/Azure/WALinuxAgent.git /opt/waagent
cd /opt/waagent
git checkout $(git describe --abbrev=0 --tags)
python3 setup.py install --register-service
Same issue with fresh Standard_B2pls_v2 in FranceCentral region.
$ waagent --version
/usr/sbin/waagent:27: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
import imp
WALinuxAgent-2.2.46 running on ubuntu 22.04
Python: 3.10.12
Goal state agent: 2.2.46
$ sudo apt-cache policy walinuxagent
walinuxagent:
Installed: 2.2.46-0ubuntu5.1
Candidate: 2.2.46-0ubuntu5.1
Version table:
*** 2.2.46-0ubuntu5.1 500
500 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main arm64 Packages
100 /var/lib/dpkg/status
2.2.46-0ubuntu5 500
500 http://ports.ubuntu.com/ubuntu-ports jammy/main arm64 Packages
$ sudo waagent -show-configuration 2>/dev/null | grep -i autoupdate
AutoUpdate.Enabled = True
AutoUpdate.GAFamily = Prod
Autoupdate.Frequency = 3600
Thank @KuSh for the workaround.
$ sudo waagent --version
WALinuxAgent-2.11.1.4 running on ubuntu 22.04
Python: 3.10.12
Goal state agent: 2.11.1.4
/var/lib/waagent/history takes up disk space. It contains 1263464 files taking up 20GB.
Distro and WALinuxAgent details (please complete the following information):
Log file attached If possible, please provide the full /var/log/waagent.log file to help us understand the problem better and get the context of the issue. The log is full of the same lines :