Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent
http://azure.microsoft.com/
Apache License 2.0
538 stars 373 forks source link

[BUG] CPU credits drain on B1ls after update to 2.10.0.3 #2940

Closed gregolsky closed 11 months ago

gregolsky commented 1 year ago
  1. Log - after update we see CPU usage increase by 2%, which then causes CPU credits drain on small burstable instances e.g. B1ls where 5% is the baseline. It happens on multiple instances since September 27th - day of the update.
Sep 27 10:52:17 vmXXXXXXXXXX python3[4788]: 2023-09-27T10:52:17.140495Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.9.1.1 is running as the goal state agent [DEBUG HeartbeatCounter: 1102;Heart
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]: 2023-09-27T11:16:49.960035Z INFO EnvHandler ExtHandler Current Firewall rules:
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]: Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]:     pkts      bytes target     prot opt in     out     source               destination
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]: Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]:     pkts      bytes target     prot opt in     out     source               destination
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]: Chain OUTPUT (policy ACCEPT 13440 packets, 1173316 bytes)
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]:     pkts      bytes target     prot opt in     out     source               destination
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]:        0        0 ACCEPT     tcp  --  *      *       0.0.0.0/0            168.63.129.16        tcp dpt:53
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]:     5241  1352079 ACCEPT     tcp  --  *      *       0.0.0.0/0            168.63.129.16        owner UID match 0
Sep 27 11:16:49 vmXXXXXXXXXX python3[4788]:        0        0 DROP       tcp  --  *      *       0.0.0.0/0            168.63.129.16        ctstate INVALID,NEW
Sep 27 11:17:47 vmXXXXXXXXXX python3[4788]: 2023-09-27T11:17:47.057006Z INFO ExtHandler ExtHandler Normal Agent upgrade discovered, updating to WALinuxAgent-2.10.0.3 -- exiting
Sep 27 11:17:47 vmXXXXXXXXXX python3[4766]: 2023-09-27T11:17:47.678238Z INFO Daemon Determined Agent WALinuxAgent-2.10.0.3 to be the latest agent

image

  1. Steps to reproduce the behavior: Update has been installed automatically from 2.9.1.1 to 2.10.0.3. Since then VM uses more CPU and hence is unusable due to lack of CPU credits. Before it was fine at around 3-4% idle.

Distro and WALinuxAgent details (please complete the following information):

Additional context Add any other context about the problem here.

Log file attached I am afraid of sharing the log publicly. Please let me know where I can upload it in a private and secure manner.

Last few days of logs is:

2023-10-08T11:32:36.369162Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T11:35:50.740842Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 528;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T12:05:56.617188Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 529;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T12:32:38.425596Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T12:36:02.670712Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 530;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T13:06:03.610721Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 531;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T13:32:38.725866Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T13:36:09.346990Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 532;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T14:06:13.728629Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 533;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T14:32:43.219347Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T14:36:13.828385Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 534;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T15:06:14.032904Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 535;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T15:32:48.711214Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T15:36:18.982439Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 536;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T16:06:21.213270Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 537;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T16:32:51.351362Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T16:36:22.284754Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 538;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T17:06:25.243201Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 539;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T17:32:55.786503Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T17:36:27.634230Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 540;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T18:06:33.072956Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 541;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T18:32:57.960034Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T18:36:34.510019Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 542;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T19:06:39.803746Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 543;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T19:33:00.249171Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T19:36:42.990300Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 544;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T20:06:47.029235Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 545;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T20:33:05.313353Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T20:36:48.079650Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 546;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T21:06:52.589408Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 547;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T21:33:05.689521Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T21:36:54.603605Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 548;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T22:06:59.405298Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 549;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T22:33:11.580529Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T22:37:00.478582Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 550;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T23:07:01.961344Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 551;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-08T23:33:16.848582Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-08T23:37:05.673724Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 552;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T00:07:05.950004Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 553;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T00:33:19.011348Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-09T00:37:07.994017Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 554;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T01:07:08.592208Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 555;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T01:33:22.137020Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-09T01:37:11.988553Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 556;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T02:07:13.349661Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 557;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T02:33:27.044056Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-09T02:37:15.867939Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 558;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T03:07:20.436134Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 559;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T03:33:29.684036Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-09T03:37:24.180559Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 560;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T04:07:30.297784Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 561;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T04:33:32.103782Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-09T04:37:33.175358Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 562;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T05:07:33.938728Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 563;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T05:33:34.418024Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-09T05:37:35.847327Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 564;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T06:07:38.616661Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 565;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
2023-10-09T06:33:34.784816Z INFO ExtHandler ExtHandler Downloading agent manifest
2023-10-09T06:37:42.313044Z INFO ExtHandler ExtHandler [HEARTBEAT] Agent WALinuxAgent-2.10.0.3 is running as the goal state agent [DEBUG HeartbeatCounter: 566;HeartbeatId: 61482AB8-898A-49F0-BD70-A2CC86857660;DroppedPackets: 0;UpdateGSErrors: 0;AutoUpdate: 1]
nagworld9 commented 1 year ago

@gregolsky can you clarify the impacted vm is running on 2.10.0.3 and not 2.10.0.2. You mentioned this

Distro and Version: Ubuntu 22.04 WALinuxAgent version 2.10.0.2

but logs indicate is 2.10.0.3.

Can you also attach following logs to nnandigam@microsoft.com /var/log/waagent.log /var/log/syslog

or you can run the log collector which will collect our agent logs and below command only works if 2.8.0.11 egg still in the vm

root# python3 -u /var/lib/waagent/WALinuxAgent-2.8.0.11/bin/WALinuxAgent-2.8.0.11-py2.7.egg -collect-logs
Running log collector mode normal
Log collection successfully completed. Archive can be found at /var/lib/waagent/logcollector/logs.zip and detailed log output can be found at /var/lib/waagent/logcollector/results.txt
gregolsky commented 1 year ago

Sorry, I fixed the version - it's 2.10.0.3 for sure. I'll collect the info for you and send it tomorrow.

We have multiple VMs affected this way - for each one CPU increased is correlated with walinuxagent update. Is there a workaround for the time being? Can we downgrade the version to 2.9?

gregolsky commented 1 year ago

@nagworld9 I sent you the files.

nagworld9 commented 1 year ago

[like] Nageswara Nandigam (HE/HIM) reacted to your message:


From: Grzegorz Lachowski @.> Sent: Monday, October 9, 2023 7:36:22 PM To: Azure/WALinuxAgent @.> Cc: Mention @.>; Comment @.> Subject: Re: [Azure/WALinuxAgent] [BUG] CPU credits drain on B1ls after update to 2.10.0.3 (Issue #2940)

@nagworld9https://github.com/nagworld9 I sent you the files.

— Reply to this email directly, view it on GitHubhttps://github.com/Azure/WALinuxAgent/issues/2940#issuecomment-1753599615 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUERSKTRDRVCOHGFDRIIMF3X6RG3NBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFU42DKNZWGYZTTAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDCOJTGI2DAMZRHAYKO5DSNFTWOZLSUZRXEZLBORSQ. You are receiving this email because you were mentioned.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

nagworld9 commented 1 year ago

@gregolsky I don't see anything different agent doing in 2.10.0.3 from the log compared to 2.9.1.1. Btw, how do you compute the CPU usage for the agent, how the above graph produced. Is there any chance you combing the last computed 2.9.1.1 value to this new value of 2.10.0.3 after update?

do you look at the agent cgroups reported cpu usage. If so, how the output of below command looks like and can also run pstree. I want to check no unknown process in the cgroup

systemd-cgls --unit system.slice --all
pstree -p 

As far as downgrade, today we don't have an option to downgrade.

after update we see CPU usage increase by 2%, which then causes CPU credits drain on small burstable instances e.g. B1ls where 5% is the baseline

In terms of percentage, it's not significant bump. When you say 5% is baseline, how this value calculated and who defines this?

gregolsky commented 1 year ago

Azure defines this. It's the smallest instance type in B-series. B1ls

For that instance 2% is a significant bump.

I just correlated the average CPU usage bump time with the graph on azure. It matches on at multiple instances.

pon., 9 paź 2023, 22:52 użytkownik Nageswara Nandigam < @.***> napisał:

@gregolsky https://github.com/gregolsky I don't see anything different agent doing in 2.10.0.3 from the log compared to 2.9.1.1. Btw, how do you compute the CPU usage for the agent, how the above graph produced. Is there any change combing the last computed 2.9.1.1 value to this new value of 2.10.0.3 after update?

do you look at the agent cgroups reported cpu usage. If so, how the output of below command looks like and can also run pstree. I want to check no unknown process in the cgroup

systemd-cgls --unit system.slice --all pstree -p

As far as downgrade, today we don't have an option to downgrade.

after update we see CPU usage increase by 2%, which then causes CPU credits drain on small burstable instances e.g. B1ls where 5% is the baseline

In terms of percentage, it's not significant bump. When you say 5% is baseline, how this value calculated and who defines this?

— Reply to this email directly, view it on GitHub https://github.com/Azure/WALinuxAgent/issues/2940#issuecomment-1753832658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYNLNTO6I2GSMLGW2HBO3X6RPZFAVCNFSM6AAAAAA5YJ2EE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJTHAZTENRVHA . You are receiving this because you were mentioned.Message ID: @.***>

gregolsky commented 1 year ago

https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-b-series-burstable

This bug renders B1ls unusable since when it goes over 5% for some time it consumes all the CPU credits. These instances needs to stay below 5% for the most time to be usable.

wt., 10 paź 2023, 05:24 użytkownik Grzegorz Lachowski < @.***> napisał:

Azure defines this. It's the smallest instance type in B-series. B1ls

For that instance 2% is a significant bump.

I just correlated the average CPU usage bump time with the graph on azure. It matches on at multiple instances.

pon., 9 paź 2023, 22:52 użytkownik Nageswara Nandigam < @.***> napisał:

@gregolsky https://github.com/gregolsky I don't see anything different agent doing in 2.10.0.3 from the log compared to 2.9.1.1. Btw, how do you compute the CPU usage for the agent, how the above graph produced. Is there any change combing the last computed 2.9.1.1 value to this new value of 2.10.0.3 after update?

do you look at the agent cgroups reported cpu usage. If so, how the output of below command looks like and can also run pstree. I want to check no unknown process in the cgroup

systemd-cgls --unit system.slice --all pstree -p

As far as downgrade, today we don't have an option to downgrade.

after update we see CPU usage increase by 2%, which then causes CPU credits drain on small burstable instances e.g. B1ls where 5% is the baseline

In terms of percentage, it's not significant bump. When you say 5% is baseline, how this value calculated and who defines this?

— Reply to this email directly, view it on GitHub https://github.com/Azure/WALinuxAgent/issues/2940#issuecomment-1753832658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYNLNTO6I2GSMLGW2HBO3X6RPZFAVCNFSM6AAAAAA5YJ2EE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJTHAZTENRVHA . You are receiving this because you were mentioned.Message ID: @.***>

nagworld9 commented 1 year ago

@gregolsky do you happen to have other distro type and vm size and having increase in CPU usage after update? is it only on these instances?

gregolsky commented 1 year ago

I have other instances (all Ubuntu 22.04 though), but here it was very easy to notice, because it lost all CPU credits and stopped responding in a timely fashion.

We're using latest Ubuntu 22.04 image on Azure.

wt., 10 paź 2023, 19:55 użytkownik Nageswara Nandigam < @.***> napisał:

@gregolsky https://github.com/gregolsky do you happen to have other distro type and vm size and having increase in CPU usage after update? is it only on these instances?

— Reply to this email directly, view it on GitHub https://github.com/Azure/WALinuxAgent/issues/2940#issuecomment-1755951819, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYNLLMQ2FHELZF5XNSBSDX6WDZRAVCNFSM6AAAAAA5YJ2EE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJVHE2TCOBRHE . You are receiving this because you were mentioned.Message ID: @.***>

gregolsky commented 1 year ago

What were the changes between 2.9 and 2.10 that may have had such an impact on CPU usage?

wt., 10 paź 2023, 20:06 użytkownik Grzegorz Lachowski < @.***> napisał:

I have other instances (all Ubuntu 22.04 though), but here it was very easy to notice, because it lost all CPU credits and stopped responding in a timely fashion.

We're using latest Ubuntu 22.04 image on Azure.

wt., 10 paź 2023, 19:55 użytkownik Nageswara Nandigam < @.***> napisał:

@gregolsky https://github.com/gregolsky do you happen to have other distro type and vm size and having increase in CPU usage after update? is it only on these instances?

— Reply to this email directly, view it on GitHub https://github.com/Azure/WALinuxAgent/issues/2940#issuecomment-1755951819, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYNLLMQ2FHELZF5XNSBSDX6WDZRAVCNFSM6AAAAAA5YJ2EE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJVHE2TCOBRHE . You are receiving this because you were mentioned.Message ID: @.***>

nagworld9 commented 1 year ago

we did rewrite around agent update, but that code path is not enabled in 2.10.0.3. The logs indicate that agent did what was expected.

Is the percentage you have shown for CPU usage is based on 1cpu core as 100% right? Agent taking more than 5% out 100% on 1 core cpu?

gregolsky commented 1 year ago

Not sure how much CPU agent's process alone took before, but our whole workload in idle state (all processes on the VM together) were around 3-5% total and it was below the baseline. After walinuxagent update it's above the baseline.

B1ls has only 1 vcpu and once it's out of CPU credits, its initial baseline is all you can use, so it shows 100% usage at all times then. However I'll try to find a VM still having some credits to check how the CPU usage patterns look. Do you have any preference on how I should collect these metrics or perform further analysis?

I'm also open for a screensharing session if that would help.

wt., 10 paź 2023 o 21:06 Nageswara Nandigam @.***> napisał(a):

we did rewrite around agent update, but that code path is not enabled in 2.10.0.3. The logs indicate that agent did what was expected.

Is the percentage you have shown for CPU usage is based on 1cpu core as 100% right? Agent taking more than 5% out 100% on 1 core cpu?

— Reply to this email directly, view it on GitHub https://github.com/Azure/WALinuxAgent/issues/2940#issuecomment-1756067052, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYNLPQFTCBV2DJYBH3JGDX6WMDTAVCNFSM6AAAAAA5YJ2EE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJWGA3DOMBVGI . You are receiving this because you were mentioned.Message ID: @.***>

nagworld9 commented 1 year ago

@gregolsky We have identified the code line which is taking bit of time, that's causing increase in CPU usage. While we work on fix, we removed the version from artifacts repository, so that new vms won't get it. In your case, in order to rollback to 2.9.1.1, Please run the following commands. Add comments in brackets. Let me know if you need help there, I can shadow you.

systemctl stop walinuxagent (stop the agent service)
cd /var/lib/waagent/  (Go to agent pkg folder)
ls  (you will see 2.10.0.3 packages in zip and folder like WALinuxAgent-2.10.0.3 and WALinuxAgent-2.10.0.3.zip)
rm -rf WALinuxAgent-2.10.*  (remove 2.10 pkgs from the vm)
ls  (make sure 2.10. got deleted)
systemctl restart walinuxagent (restart agent, so that it will pick up 2.9.1.1 as latest)
waagent --version (check the version in the output and something like this "Goal state agent: 2.9.1.1")
gregolsky commented 1 year ago

That's great, I'm going to try it out today and get back to you.

czw., 12 paź 2023, 03:51 użytkownik Nageswara Nandigam < @.***> napisał:

@gregolsky https://github.com/gregolsky We have identified the code line which is taking bit of time, that's causing increase in CPU usage. While we work on fix, we removed the version from artifacts repository, so that new vms won't get it. In your case, in order to rollback to 2.9.1.1, Please run the following commands. Add comments in brackets. Let me know if you need help there, I can shadow you.

systemctl stop walinuxagent (stop the agent service) cd /var/lib/waagent/ (Go to agent pkg folder) ls (you will see 2.10.0.3 packages in zip and folder like WALinuxAgent-2.10.0.3 and WALinuxAgent-2.10.0.3.zip) rm -rf WALinuxAgent-2.10.* (remove 2.10 pkgs from the vm) ls (make sure 2.10. got deleted) systemctl restart walinuxagent (restart agent, so that it will pick up 2.9.1.1 as latest) waagent --version (check the version in the output and something like this "Goal state agent: 2.9.1.1")

— Reply to this email directly, view it on GitHub https://github.com/Azure/WALinuxAgent/issues/2940#issuecomment-1758789691, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYNLO5TSLH2OFFQWYX5TDX65ELFANCNFSM6AAAAAA5YJ2EEY . You are receiving this because you were mentioned.Message ID: @.***>

gregolsky commented 12 months ago

We applied downgrade instructions and they worked. Thank you.

nagworld9 commented 11 months ago

Fixing in https://github.com/Azure/WALinuxAgent/pull/2967