Closed v-eslop closed 4 years ago
The latest version of the agent is 2.2.25, so if you can reproduce with that version, it would be of interest.
On my Ubuntu 16.04 VM, the agent version 2.2.25 is consuming 239 MB VIRT and 55 MB RES for Python 3.5.2. The VM has been up for > 20 days, and has multiple extensions installed.
Thanks for your response Boumenot, I have another openSUSE VM where I installed the github waagent (2.2.21). It has been running almost 8 days and the memory usage is 636MB VIRT and 400MB RES and it is loading the version 2.2.25:
openSUSE2:/home/ # ps -aux | grep python | grep -v grep root 1335 0.0 0.8 94044 15716 ? Ss Mar26 0:00 /usr/bin/python -u /usr/sbin/waagent -daemon root 1477 1.5 20.5 636864 403140 ? Sl Mar26 169:50 python -u bin/WALinuxAgent-2.2.25-py2.7.egg -run-exthandlers
I have another SLES 12sp3 with 2.2.18 and similar config/resources, the memory usage is 100MB VIRT and 22MB RES.
For the record, I ran into the same issue (on Debian) with 2.2.14 (not sure why the command line says 2.2.22):
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 129607 0.9 82.8 1833120 1630728 ? Sl Feb22 634:08 python3 -u bin/WALinuxAgent-2.2.22-py2.7.egg -run-exthandlers
Upgraded the agent to 2.2.18 to see if it is any better.
On Debian Jessie 2.2.14 is the latest available, i.e. no way to upgrade to a higher version.
ii waagent 2.2.14-1~deb9u1~ all Windows Azure Linux Agent
Same issue on Debian 9, the agent eats 600MB after 3 weeks:
root 842 0.9 31.3 818336 616800 ? Sl Apr12 254:39 python3 -u bin/WALinuxAgent-2.2.25-py2.7.egg -run-exthandlers
I read that as ~80 MB resident and ~600 MB of virtual. Do you run any extensions?
It looks like the number in VSIZE is 818336
, which is somewhat closer to 800MB. 616800
is in RSS, assuming this is ps auwx
(or similar), not ps -ef
You're right, I read that wrong. I read 5 digits, but there were 6.
I didn't install any extension, and the leak is exactly the same on 3 VMs I started at the same time. Note that there is this error in syslog, maybe related to the leak, as it disappears after I restart waagent:
waagent[566]: 2018/05/01 08:40:29.423942 ERROR 'pidof dhclient' raised unexpected exception: '[Errno 12] Cannot allocate memory'
I will spin up a Debian 9 VM, and see if I can reproduce. We are adding support for better telemetry around CPU and memory usage, so we can better track these issues. This is expected to ship in the next version. We need to do more work to better track performance and performance regressions.
I have reproduced the issue on Debian 9. I created a branch to help track down the memory leak.
The top offender after a few days is this line.
Top 50 lines
#1: python3.5/socket.py:647: 13609.4 KiB
self._sock = None
It looks like the agent is failing to cleanup a resource. Unfortunately, I failed to add enough of stacktrace to the call to narrow down where the issue is the agent code. I will fix that and update this comment.
Same issue here on latest Debian, using version 2.2.25. Server uptime is 35 days, and the current memory usage of this process is about 1400MB (virt), and 1200MB (res). That is a lot, and now it is causing other processes to run out of memory (VM has 8GB RAM total) Output of ps aux:
root 873 0.7 14.8 1411424 1212056 ? Sl ápr12 403:50 python3 -u bin/WALinuxAgent-2.2.25-py2.7.egg -run-exthandlers
Maybe there is a quick workaround for this? Is this process safe to restart ie. every day from cron?
The work around is to restart the agent. ☹️ It is safe to restart the agent if extensions are not being processed.
I still have not gotten to the bottom of why it is leaking on Debian.
Hi - I have created a memory dump on a VW which is long running - and the agent now uses about 2.6GB memory. FYI I was using https://github.com/rogerhu/gdb-heap just in case you want to reproduce. Snippet of the output (which seems so be repeated a lot):
51935: 0x000055b938d8f470 -> 0x000055b938d8f48f 32 bytes uncategorized::32 bytes |I....... r.8.U..........!.......|
51936: 0x000055b938d8f490 -> 0x000055b938d8f4af 32 bytes uncategorized::32 bytes |...........8.U..sp.com..a.......|
51937: 0x000055b938d8f4b0 -> 0x000055b938d8f50f 96 bytes C:string data:None |http://mscrl.microsoft.com/pki/m|
51938: 0x000055b938d8f510 -> 0x000055b938d8f56f 96 bytes C:string data:None |http://crl.microsoft.com/pki/msc|
51939: 0x000055b938d8f570 -> 0x000055b938d8f59f 48 bytes uncategorized::48 bytes |.e.8.U..........................|
51940: 0x000055b938d8f5a0 -> 0x000055b938d8f5ff 96 bytes C:string data:None |http://crl.microsoft.com/pki/msc|
51941: 0x000055b938d8f600 -> 0x000055b938d8f61f 32 bytes uncategorized::32 bytes |.........O.8.U..........1.......|
51942: 0x000055b938d8f620 -> 0x000055b938d8f64f 48 bytes uncategorized::48 bytes |...8.U..........................|
51943: 0x000055b938d8f650 -> 0x000055b938d8f67f 48 bytes uncategorized::48 bytes |...8.U.....8.U..................|
51944: 0x000055b938d8f680 -> 0x000055b938d8f69f 32 bytes uncategorized::32 bytes |K..........8.U..........!.......|
51945: 0x000055b938d8f6a0 -> 0x000055b938d8f6bf 32 bytes uncategorized::32 bytes |........Pj.8.U..........!.......|
51946: 0x000055b938d8f6c0 -> 0x000055b938d8f6df 32 bytes uncategorized::32 bytes |K..........8.U..........1.......|
51947: 0x000055b938d8f6e0 -> 0x000055b938d8f70f 48 bytes uncategorized::48 bytes |...........8.U..................|
51948: 0x000055b938d8f710 -> 0x000055b938d8f73f 48 bytes uncategorized::48 bytes |@\.8.U..........................|
51949: 0x000055b938d8f740 -> 0x000055b938d8f76f 48 bytes uncategorized::48 bytes |.3.8.U..........................|
51950: 0x000055b938d8f770 -> 0x000055b938d8f79f 48 bytes uncategorized::48 bytes |...........8.U..................|
51951: 0x000055b938d8f7a0 -> 0x000055b938d8f7df 64 bytes uncategorized::64 bytes |P..8.U.....8.U..................|
What that suggests to me based on the URLs of the GDB output, that the leak may be related to CRL and therefore TLS/HTTPS handshake. If this is debian specific, this may be somehow related to openssl, or the openssl version bundled with python3 on debian.. So that is where I would start investigating the issue.
Thanks very much for this information, @martonw .
@boumenot has been looking at the differences between Ubuntu (no leak) and Debian (leak). He's determined that the default python versions of the two distros differ; in looking into that difference, he noted that different versions of OpenSSL were used. As a result, we've evolved a working hypothesis of a bug in the specific OpenSSL version used by the precise python version Debian puts in its distro and repo, and we're looking more deeply down that path.
Adding your data to our analysis to this point is very indicative, but still not definitive; it sure does point in a direction for deeper investigation. Again, thanks for the effort you've put in here.
tl;dr - consider fixing #1169 for this release by making Host Plugin the default thereby avoid the HTTPS in Python code. The Host Plugin will forward the request using HTTPS, and does not leak unlike in Python.
I used tracemalloc to figure out the issue was somewhere in code that was using socket. The most likely offender was our restutil library. I wrapped all HTTP usage in a context to ensure the connection and response where cleaned up, but that did not address the issue. After this fix I no longer observed an issue with tracemalloc, but the agent process continued to leak.
I isolated the individual HTTP calls for fetching goal state and PUT'ing VM status. The HTTP
calls for fetching goal state did not leak. The HTTPS
calls for PUT'ing VM status did leak. Status is uploaded via HTTPS, and goal state is fetched over HTTP.
I tried to isolate the verison of Python that did not leak (Ubuntu 16.04 3.5.2) vs. the version that did leak (Debian 9 3.5.3). I was going to create a virtual environment for 3.5.2 on Debian 9, but I was unable to build this version of Python on Debian using the packages available in the Debian 9 repo. Python 3.5.2 supports OpenSSL 1.0, and support for OpenSSL 1.1 was added in Python 3.5.3. I believe this is at least one reason why Debian 9 ships Python 3.5.3 (or perhaps vice-versa). (The Python SSL module is linked against 1.1.)
I abandonded trying to get 3.5.2 running on Debian 9 because it would require too much customization to be successful for others to use. Instead, I switched the agent to the Host Plugin. Host Plugin acts as a proxy for requests, and forwards them to the outside for whitelisted links such as status blob and extension zip files. Requests are sent to Host Plugin over HTTP and then forwarded as HTTPS as appropriate. This appears to have resolved the memory leak, but I am continuing to let the change bake. Without this change the leak would monotonically increase and could be readily observed. I am not seeing that any more, but I would like more time to verify the fix.
/cc @waldiTM - are you aware of any issue or memory leaks with regards to Python 3.5.3 and OpenSSL 1.1? Have you seen any bug reports?
Thanks for the quick iteration on this!
One idea I thought of is reimporting the httplib python module from time to time - that could theoretically allow the runtime to free up the memory. Something like that:
import importlib
...
importlib.reload(httplib)
I am not sure though that this would fix the issue.. but if it works, it could be a relatively small code change in the agent.
On Thu, May 24, 2018 at 03:04:38PM -0700, Christopher Boumenot wrote:
/cc @waldiTM - are you aware of any issue or memory leaks with regards to Python 3.5.3 and OpenSSL 1.1? Have you seen any bug reports?
No, I have not. My test machine running on 2.2.18 does not exhibit this behaviour.
root 720 0.0 0.3 67460 23684 ? Ss Mai03 0:00 /usr/bin/python3 /usr/sbin/waagent -daemon
root 726 0.2 0.5 236924 41404 ? Sl Mai03 92:48 python3 -u /usr/sbin/waagent -run-exthandlers
But I see there is a 2.2.26 release already, so let's break it.
-- Extreme feminine beauty is always disturbing. -- Spock, "The Cloud Minders", stardate 5818.4
No, I have not. My test machine running on 2.2.18 does not exhibit this behaviour.
That's a very good point. Let me check that out too.
I have a server exhibiting this problem:
uname: Linux redacted 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1 (2018-04-29) x86_64 GNU/Linux waagent: WALinuxAgent-2.2.25 Python: 3.5.3 OpenSSL: 1.1.0f 25 May 2017
AFAIK I don't have any extensions, but how do I check?
ps -aux | grep python | grep -v grep
shows:
root 9059 0.0 0.5 64648 23352 ? Ss 18:10 0:00 /usr/bin/python3 /usr/sbin/waagent -daemon root 9075 2.0 0.6 226464 26204 ? Sl 18:10 0:36 python3 -u bin/WALinuxAgent-2.2.25-py2.7.egg -run-exthandlers
(plus fail2ban also running)
A variety of openssl-based errors are showing in syslog with being unable to allocate memory. Restarting the waagent service resolved the issue (well, for the moment, I'll see if it returns).
All default repository versions though apt-get
.
A bit more investigation shows the waagent.log with:
2018/05/20 03:33:57.988867 INFO Event: name=WALinuxAgent, op=ProcessGoalState, message=Incarnation 17, duration=6894 2018/05/20 12:01:55.677852 ERROR 'iptables --version' raised unexpected exception: '[Errno 12] Cannot allocate memory' 2018/05/20 12:01:55.678507 WARNING Unable to determine version of iptables 2018/05/20 12:01:55.678687 INFO Unable to establish firewall -- no further attempts will be made: Unable to determine version of iptables 2018/05/20 12:01:55.678893 ERROR Event: name=WALinuxAgent, op=Firewall, message=, duration=0 2018/05/20 12:01:55.680230 ERROR 'pidof dhclient' raised unexpected exception: '[Errno 12] Cannot allocate memory'
Everything fine til 12:01, then the errors start, the last pidof dhclient
error repeats hundreds of times until we get this section on repeat:
2018/05/21 03:52:25.352479 ERROR 'pidof dhclient' raised unexpected exception: '[Errno 12] Cannot allocate memory' 2018/05/21 03:52:30.371337 ERROR 'pidof dhclient' raised unexpected exception: '[Errno 12] Cannot allocate memory' 2018/05/21 03:52:30.536591 ERROR '/usr/bin/openssl cms -decrypt -in /var/lib/waagent/Certificates.p7m -inkey /var/lib/waagent/TransportPrivate.pem -recip /var/lib/waagent/TransportCert.pem | /usr/bin/openssl pkcs12 -nodes redacted -out /var/lib/waagent/Certificates.pem' raise d unexpected exception: '[Errno 12] Cannot allocate memory' 2018/05/21 03:52:30.542365 ERROR '/usr/bin/openssl cms -decrypt -in /var/lib/waagent/Certificates.p7m -inkey /var/lib/waagent/TransportPrivate.pem -recip /var/lib/waagent/TransportCert.pem | /usr/bin/openssl pkcs12 -nodes redacted -out /var/lib/waagent/Certificates.pem' raise d unexpected exception: '[Errno 12] Cannot allocate memory' 2018/05/21 03:52:30.556853 ERROR Failed to decrypt /var/lib/waagent/Certificates.p7m 2018/05/21 03:52:30.568568 ERROR '/usr/bin/openssl rsa -in /var/lib/waagent/0.prv -pubout 2>/dev/null' raised unexpected exception: '[Errno 12] Cannot allocate memory' 2018/05/21 03:52:30.580690 ERROR '/usr/bin/openssl x509 -in /var/lib/waagent/1.crt -pubkey -noout' raised unexpected exception: '[Errno 12] Cannot allocate memory' 2018/05/21 03:52:30.589216 ERROR '/usr/bin/openssl x509 -in /var/lib/waagent/1.crt redacted -noout' raised unexpected exception: '[Errno 12] Cannot allocate memory' 2018/05/21 03:52:30.609054 WARNING Exception processing GoalState-related files: list index out of range
I'm presuming the openssl logging stuff isn't sensitive, I don't think the redacted bits are either; let me know if it is! I'm not sure whether this is of any use, but let me know of any further steps I might take to help.
I compared the changes from 2.2.18 and 2.2.25, and the changes are minimal. Any changes that do exist do not appear to be relevant to this issue.
I downgraded to 2.2.18, and let it run from 25-May-2018. This version also leaks.
$ ps auwx | grep ext
root 18950 12.9 14.7 330696 137720 ? Rl May25 714:57 python3 -u /usr/sbin/waagent -run-exthandlers
I'll continue with the fix I proposed, and address the feedback.
@Worthaboutapig - thanks for helping to debug this. The agent has a leak on Debian 9. We also have reports of a leak on OpenSuSE. We suspect the leak is coming from either Python 3.5.3 or OpenSSL 1.1 or an interaction between the agent and the two, but that has not been root caused. At this point, we have a work around to the issue that appears to fix the leak, so we are going with that fix.
The way you determine if you have extension installed is by looking at your VM in the portal. Locate your VM, and click on the Extensions tab. This tab will list all extensions installed.
@boumenot Thanks, is this the HostPlugin fix? Will take some time to filter through to Debian stable, I suppose... Thanks for the extension info.
@Worthaboutapig - yes.
This was fixed in #1194.
The agent will be moving to HTTPS even when communicating with WireServer, so the current mitigation will not be sufficient.
Experiencing similar issues on Ubuntu walinuxagent 2.2.32-0ubuntu1~16.04.2
it's a production server and has dropped out three times now.
Same pattern every time- CPU usage spikes up, disk usage goes bananas, then it dies- network traffic flatlines.
Will open ticket with Azure support also.
We've had persistently high memory usage on our production Debian server
Consistently has been using around 17% of memory for at least 450 days.
@isdampe Could you please attach /var/log/waagent.log
? And also the container id of your VM? (You can fetch the containerId from the /var/lib/waagent/GoalState.X.xml
)
/var/log/waagent.log:
2019/12/01 07:47:47.229479 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/01 07:48:24.037338 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/01 19:48:24.553589 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/01 19:49:04.639795 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/01 22:02:15.674369 INFO ExtHandler Found private key matching thumbprint A13502F490819D3CBE7D08522E5F4280F0321F32
2019/12/01 22:02:15.775522 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] Target handler state: enabled
2019/12/01 22:02:15.786202 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] [Enable] current handler state is: enabled
2019/12/01 22:02:15.793904 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] Update settings file: 0.settings
2019/12/01 22:02:15.803671 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] Enable extension [bin/run-command-shim enable]
2019/12/01 22:02:16.084282 INFO ExtHandler Added PID 27619 to cgroup /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/01 22:02:16.089735 INFO ExtHandler Added PID 27619 to cgroup /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/01 22:02:16.144754 INFO ExtHandler Started tracking new cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/01 22:02:16.148634 INFO ExtHandler Started tracking new cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/01 22:02:18.225641 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] Target handler state: enabled
2019/12/01 22:02:18.287245 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] [Enable] current handler state is: enabled
2019/12/01 22:02:18.290806 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] Update settings file: 471.settings
2019/12/01 22:02:18.295608 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] Enable extension [main/handle.sh enable]
2019/12/01 22:02:18.342824 INFO ExtHandler Added PID 27646 to cgroup /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/01 22:02:18.346526 INFO ExtHandler Added PID 27646 to cgroup /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/01 22:02:18.380967 INFO ExtHandler Started tracking new cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/01 22:02:18.382118 INFO ExtHandler Started tracking new cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/01 22:02:23.435564 INFO ExtHandler ProcessGoalState completed [incarnation 477; 8096 ms]
2019/12/01 22:06:31.663779 INFO ExtHandler Stopped tracking cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/01 22:06:31.793282 INFO ExtHandler Stopped tracking cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/01 22:06:31.926996 INFO ExtHandler Stopped tracking cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/01 22:06:32.001535 INFO ExtHandler Stopped tracking cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/02 07:49:04.819816 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/02 07:49:41.707715 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/02 19:49:43.508952 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/02 19:50:38.635802 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/02 22:12:23.419070 INFO ExtHandler Found private key matching thumbprint A13502F490819D3CBE7D08522E5F4280F0321F32
2019/12/02 22:12:23.521458 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] Target handler state: enabled
2019/12/02 22:12:23.603455 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] [Enable] current handler state is: enabled
2019/12/02 22:12:23.687103 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] Update settings file: 0.settings
2019/12/02 22:12:23.769522 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] Enable extension [bin/run-command-shim enable]
2019/12/02 22:12:23.887087 INFO ExtHandler Added PID 119495 to cgroup /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/02 22:12:23.944433 INFO ExtHandler Added PID 119495 to cgroup /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/02 22:12:24.024840 INFO ExtHandler Started tracking new cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/02 22:12:24.116637 INFO ExtHandler Started tracking new cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/02 22:12:26.300241 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] Target handler state: enabled
2019/12/02 22:12:26.367050 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] [Enable] current handler state is: enabled
2019/12/02 22:12:26.371050 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] Update settings file: 472.settings
2019/12/02 22:12:26.375904 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] Enable extension [main/handle.sh enable]
2019/12/02 22:12:26.426064 INFO ExtHandler Added PID 119526 to cgroup /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/02 22:12:26.427094 INFO ExtHandler Added PID 119526 to cgroup /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/02 22:12:26.476019 INFO ExtHandler Started tracking new cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/02 22:12:26.478458 INFO ExtHandler Started tracking new cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/02 22:12:30.681592 INFO ExtHandler ProcessGoalState completed [incarnation 478; 7539 ms]
2019/12/02 22:17:11.911486 INFO ExtHandler Stopped tracking cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/02 22:17:12.048477 INFO ExtHandler Stopped tracking cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/02 22:17:12.181547 INFO ExtHandler Stopped tracking cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/02 22:17:12.333576 INFO ExtHandler Stopped tracking cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/03 07:50:39.098089 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/03 07:51:28.152739 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/03 18:11:51.402049 INFO ExtHandler Stopped tracking cgroup: walinuxagent.service, path: /sys/fs/cgroup/cpu/walinuxagent.service
2019/12/03 18:11:51.416574 INFO ExtHandler Stopped tracking cgroup: walinuxagent.service, path: /sys/fs/cgroup/memory/walinuxagent.service
2019/12/03 19:51:30.492503 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/03 19:52:16.423152 INFO ExtHandler [PERIODIC] Using host plugin as default channel.
2019/12/03 22:07:37.282047 INFO ExtHandler Found private key matching thumbprint A13502F490819D3CBE7D08522E5F4280F0321F32
2019/12/03 22:07:37.386913 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] Target handler state: enabled
2019/12/03 22:07:37.465239 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] [Enable] current handler state is: enabled
2019/12/03 22:07:37.501136 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] Update settings file: 0.settings
2019/12/03 22:07:37.501713 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.1] Enable extension [bin/run-command-shim enable]
2019/12/03 22:07:37.742872 INFO ExtHandler Added PID 96608 to cgroup /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/03 22:07:37.792522 INFO ExtHandler Added PID 96608 to cgroup /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/03 22:07:37.880080 INFO ExtHandler Started tracking new cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/03 22:07:37.885266 INFO ExtHandler Started tracking new cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/03 22:07:39.909207 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] Target handler state: enabled
2019/12/03 22:07:39.917685 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] [Enable] current handler state is: enabled
2019/12/03 22:07:39.921576 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] Update settings file: 473.settings
2019/12/03 22:07:39.926297 INFO [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0] Enable extension [main/handle.sh enable]
2019/12/03 22:07:39.983903 INFO ExtHandler Added PID 96639 to cgroup /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/03 22:07:39.991565 INFO ExtHandler Added PID 96639 to cgroup /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/03 22:07:40.406113 INFO ExtHandler Started tracking new cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/03 22:07:40.410532 INFO ExtHandler Started tracking new cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/03 22:07:45.042060 INFO ExtHandler Stopped tracking cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/03 22:07:45.059376 INFO ExtHandler Stopped tracking cgroup: Microsoft.CPlat.Core.RunCommandLinux-1.0.1, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.CPlat.Core.RunCommandLinux_1.0.1
2019/12/03 22:07:45.087597 INFO ExtHandler ProcessGoalState completed [incarnation 479; 8070 ms]
2019/12/03 22:12:46.387379 INFO ExtHandler Stopped tracking cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/cpu/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
2019/12/03 22:12:46.405444 INFO ExtHandler Stopped tracking cgroup: Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9156.0, path: /sys/fs/cgroup/memory/walinuxagent.extensions/Microsoft.Azure.RecoveryServices.VMSnapshotLinux_1.0.9156.0
There's no containerId key in GoalState.X.xml
, is InstanceId
the same thing?
You should see something like
<ContainerId>8895da0d-xxxx-xxxx-xxxx-4e001d82c638</ContainerId>
That would help.
Re-open if still active.
I have a new openSUSE 42.3 VM in Azure with 2GB RAM, we noticed that the memory usage has been increasing with the days, after 9 days it is consuming 21.94% (400M).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16296 root 20 0 658972 429484 7508 S 0.667 21.94 224:32.85 python
openSUSE1:/home/ # ps -aux | grep python root 15392 0.0 0.8 99536 17052 ? Ss Mar23 0:00 /usr/bin/python -u /usr/sbin/waagent -daemon root 16296 1.4 21.9 658972 429484 ? Sl Mar23 224:35 python -u /usr/sbin/waagent -run-exthandlers
openSUSE1:/home/ # waagent --version WALinuxAgent-2.2.18 running on suse 42.3 Python: 2.7.13 Goal state agent: 2.2.18
It is running with python 2.7, the waagent.log* only show HeartBeats and the machine does not have any extension installed. Any help on this will be appreciate.