bus1 / dbus-broker

Linux D-Bus Message Broker
https://github.com/bus1/dbus-broker/wiki
Apache License 2.0
682 stars 81 forks source link

dbus-broker leaks memory over time #201

Closed Valmar33 closed 4 years ago

Valmar33 commented 5 years ago

Distro is Arch Linux.

dbus-broker version is 20, and is from the offical repo.

Usually, it's very slow and steady.

A couple of instances, I've left my machine overnight, come back, and I've only noticed when I try starting Steam, with it crashing due to a DBus error I can't recall, and then seeing that dbus-broker was consuming many GiB of RAM.

I'll try and grab screenshots next time, if necessary.

I did grab the log from journalctl, however:

dbus-broker.tar.gz

Halfway, the log suffers ridiculous spamming which is probably what resulted in the event of dbus-broker consuming GiBs of RAM.

Current dbus-broker RAM usage is 43KiB. Was certainly a lot smaller earlier today.

Screenshot_20190421_181339

dvdhrm commented 5 years ago

Thanks a lot for the report! I have some more questions, though:

Distro is Arch Linux.

dbus-broker version is 20, and is from the offical repo.

Usually, it's very slow and steady.

What do you mean with it is very slow and steady? I am not sure what to make of this. Can you elaborate?

A couple of instances, I've left my machine overnight, come back, and I've only noticed when I try starting Steam, with it crashing due to a DBus error I can't recall, and then seeing that dbus-broker was consuming many GiB of RAM.

I'll try and grab screenshots next time, if necessary.

I did grab the log from journalctl, however:

dbus-broker.tar.gz

Halfway, the log suffers ridiculous spamming which is probably what resulted in the event of dbus-broker consuming GiBs of RAM.

Current dbus-broker RAM usage is 43KiB. Was certainly a lot smaller earlier today.

Yeah, the logs are quite expressive. First of all, since dbus-broker-20, the log-spam with the following message is now suppressed:

`dbus-broker-launch[590]: Activation request for 'org.freedesktop.resolve1' failed: Unit dbus-org.freedesktop.resolve1.service not found.`

So the first 13k lines of that log can be ignored.

Starting at line 13165 you have a reboot where the excessive memory consumption happens. The relevant log-messages are:

dbus-broker[573]: UID 1000 exceeded its 'bytes' quota on UID 1000.
dbus-broker[573]: Peer :1.95 is being disconnected as it does not have the resources to receive a signal it subscribed to.
dbus-broker[573]: Peer :1.166367 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
dbus-broker[573]: Peer :1.43 is being disconnected as it does not have the resources to receive a signal it subscribed to.
dbus-broker[573]: Peer :1.166391 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
dbus-broker[573]: Peer :1.166392 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.

...and this then continues with many more messages just like the last one.

What is going on here is that your UID (which I assume is 1000) exceeded its memory quota on the system bus. The D-Bus message bus allows all users on your system to send messages, and as long as a message was not yet received/dequeued by its receiver, it must be buffered somewhere. In the case of D-Bus the messages must be buffered in the message bus instance. To make sure a single UID on the system cannot exploit the entire system message bus, the allowed number of messages (more precisely: the number of bytes) is limited per user. Furthermore, there are quotas applied if a message is sent from one UID to a different UID, to protect the UIDs from each other.

The first log-message in that log-excerpt tells you that the user with UID 1000 has so many pending messages (or other D-Bus objects) that it exceeded its own allowance. Hence, further attempts by that user to send more messages are rejected.

Of course, I cannot tell for sure, but my assumption based on these log-messages is that dbus-broker does exactly what it is told: It tries to transmit the messages it is given by the clients. The problem is, there seems to be some client on your system that runs under your UID which either does not dequeue its messages, or exhibits some other buggy client-behavior.

In my opinion, there is little dbus-broker can do here. The memory-limits dbus-broker has for clients is configurable, so you can reduce it if you want dbus-broker to reduce the memory consumption (Note that these limits are defined by the dbus-daemon project, and dbus-broker has a compatibility layer to parse them). However, this will not fix your bug, since there will still be a buggy client application that consumes all resources of your UID. What dbus-broker definitely should do, though, is to print more elaborate diagnostics and some heuristics to try to find the buggy client.

Anyway, maybe you have a suspicion about which application triggers this behavior? In this case it should be a lot easier to track down.

Thanks David

Valmar33 commented 5 years ago

What do you mean with it is very slow and steady? I am not sure what to make of this. Can you elaborate?

I just notice it has increased by small amounts whenever I occasionally check ksysguard. I don't know what's causing the leak, though.

In my opinion, there is little dbus-broker can do here. The memory-limits dbus-broker has for clients is configurable, so you can reduce it if you want dbus-broker to reduce the memory consumption (Note that these limits are defined by the dbus-daemon project, and dbus-broker has a compatibility layer to parse them). However, this will not fix your bug, since there will still be a buggy client application that consumes all resources of your UID. What dbus-broker definitely should do, though, is to print more elaborate diagnostics and some heuristics to try to find the buggy client.

This would be super helpful for tracking down whatever application is going nuts.

Anyway, maybe you have a suspicion about which application triggers this behavior? In this case it should be a lot easier to track down.

No, I don't, sorry. I'm not even sure where to start. :/

hit0ri commented 5 years ago

I think I'm having the same issue.
I left my laptop turned on for the weekend and this morning my system was totally borked.
What can I do or what information can I provide to help resolving this issue?

...
dbus-broker[1138]: UID 0 exceeded its 'bytes' quota on UID 0.
abrt-dbus[7902]: The name 'org.freedesktop.problems' has been lost, please check if other service owning the name is not running.
dbus-broker[1138]: Peer :1.562 is being disconnected as it does not have the resources to receive a signal it subscribed to.
systemd[1]: dbus-:1.7-org.freedesktop.problems@0.service: Succeeded.
audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dbus-:1.7-org.freedesktop.problems@0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
dbus-broker[1138]: Peer :1.53 is being disconnected as it does not have the resources to receive a signal it subscribed to.
dbus-broker[1138]: Peer :1.166464 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
cupsd[1262]: Unable to communicate with avahi-daemon: Daemon connection failed
dbus-broker[1138]: Peer :1.51 is being disconnected as it does not have the resources to receive a signal it subscribed to.
dbus-broker[1138]: Peer :1.30 is being disconnected as it does not have the resources to receive a reply it requested.
dbus-broker[1138]: Peer :1.26 is being disconnected as it does not have the resources to receive a signal it subscribed to.
dbus-broker[1138]: Peer :1.6 is being disconnected as it does not have the resources to perform an operation.
dbus-broker[1138]: Peer :1.166559 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
systemd[1]: Failed to get initial list of names: Connection reset by peer
/usr/libexec/gdm-x-session[7390]: (EE)
/usr/libexec/gdm-x-session[7390]: Fatal server error:
/usr/libexec/gdm-x-session[7390]: (EE) systemd-logind disappeared (stopped/restarted?)
...

dbus-broker memory

Fedora 30
dbus-broker-21-3.fc30.x86_64

teg commented 5 years ago

What is hitori? Am I reading correctly, that you have a process (hitori) that is consuming huge amounts of memory, and while it is dbus-broker is steadily consuming more. Then hitori stops consuming memory and dbus stops increasing its usage (but does not go down either)? What UID is the other process running under?

teg commented 5 years ago

I notice that in both the reports the number of clients exceed 166k (and the error seems to trigger with about the same peer id). That seems like something worth investigating. Could either of you do a busctl call on a misbehaving system to ascertain whether or not you have that many active peers, or if they have mostly connected and disconnected again?

hit0ri commented 5 years ago

users.mean_mem.* displays the amount of memory consumed by a user. So you can see that while my user session (users.mean_mem.hitori) is running - dbus-broker (users.mean_mem.dbus) memory usage increases up untill it reaches 1.7GB.

Unfortunately I can't provide you a busctl output since I performed a clean reinstall of Fedora 30 and no longer see such behavior.

hit0ri commented 4 years ago

It's back after Fedora 31 upgrade.

busctl --no-pager --verbose

``` NAME PID PROCESS USER CONNECTION UNIT SESSION DESCRIPTION :1.0 950 bluetoothd root :1.0 bluetooth.service - - :1.1 958 rtkit-daemon root :1.1 rtkit-daemon.service - - :1.100557 213236 busctl hitori :1.100557 user@1400000098.service - - :1.112 1205 pulseaudio hitori :1.112 user@1400000098.service - - :1.12 1075 accounts-daemon root :1.12 accounts-daemon.service - - :1.14 1065 polkitd polkitd :1.14 polkit.service - - :1.145 1869 upowerd root :1.145 upower.service - - :1.15 1095 systemd-logind root :1.15 systemd-logind.service - - :1.15844 38922 gdm-session-wor root :1.15844 session-3.scope 3 - :1.16 952 firewalld root :1.16 firewalld.service - - :1.17 1111 NetworkManager root :1.17 NetworkManager.service - - :1.2 947 ModemManager root :1.2 ModemManager.service - - :1.22 1122 cupsd root :1.22 cups.service - - :1.232 2129 dockerd root :1.232 docker.service - - :1.243 2508 gdm-session-wor root :1.243 session-3.scope 3 - :1.25 1127 oddjobd root :1.25 oddjobd.service - - :1.268 2539 gnome-keyring-d hitori :1.268 session-3.scope 3 - :1.27 1111 NetworkManager root :1.27 NetworkManager.service - - :1.28 1127 oddjobd root :1.28 oddjobd.service - - :1.280 2547 Xorg hitori :1.280 session-3.scope 3 - :1.2880 10154 gvfsd-network hitori :1.2880 user@1400000098.service - - :1.2887 10173 gvfsd-dnssd hitori :1.2887 user@1400000098.service - - :1.29 1141 gdm root :1.29 gdm.service - - :1.341 2545 gdm-x-session hitori :1.341 session-3.scope 3 - :1.383 2674 at-spi-bus-laun hitori :1.383 user@1400000098.service - - :1.39 1139 colord colord :1.39 colord.service - - :1.391 2587 gnome-session-b hitori :1.391 session-3.scope 3 - :1.399 2700 gnome-session-b hitori :1.399 user@1400000098.service - - :1.4 964 udisksd root :1.4 udisks2.service - - :1.40 1053 sssd_be root :1.40 sssd.service - - :1.417 2727 gnome-shell hitori :1.417 user@1400000098.service - - :1.427 2794 evolution-sourc hitori :1.427 user@1400000098.service - - :1.428 2790 gnome-shell-cal hitori :1.428 user@1400000098.service - - :1.439 2800 goa-daemon hitori :1.439 user@1400000098.service - - :1.440 2827 mission-control hitori :1.440 user@1400000098.service - - :1.442 2831 gvfs-udisks2-vo hitori :1.442 user@1400000098.service - - :1.477 2893 ibus-dconf hitori :1.477 user@1400000098.service - - :1.478 2894 ibus-extension- hitori :1.478 user@1400000098.service - - :1.483 2914 evolution-calen hitori :1.483 user@1400000098.service - - :1.484 2928 evolution-addre hitori :1.484 user@1400000098.service - - :1.489 2945 gsd-a11y-settin hitori :1.489 user@1400000098.service - - :1.491 2948 gsd-housekeepin hitori :1.491 user@1400000098.service - - :1.499 2949 gsd-keyboard hitori :1.499 user@1400000098.service - - :1.5 963 systemd-machine root :1.5 systemd-machined.service - - :1.502 2957 gsd-sharing hitori :1.502 user@1400000098.service - - :1.503 2995 gsd-disk-utilit hitori :1.503 user@1400000098.service - - :1.506 2946 gsd-color hitori :1.506 user@1400000098.service - - :1.509 2950 gsd-media-keys hitori :1.509 user@1400000098.service - - :1.514 2951 gsd-power hitori :1.514 user@1400000098.service - - :1.516 2953 gsd-print-notif hitori :1.516 user@1400000098.service - - :1.517 2961 gsd-wwan hitori :1.517 user@1400000098.service - - :1.519 2954 gsd-rfkill hitori :1.519 user@1400000098.service - - :1.520 2962 gsd-xsettings hitori :1.520 user@1400000098.service - - :1.521 2959 gsd-sound hitori :1.521 user@1400000098.service - - :1.523 2947 gsd-datetime hitori :1.523 user@1400000098.service - - :1.527 2958 gsd-smartcard hitori :1.527 user@1400000098.service - - :1.530 3018 tracker-miner-f hitori :1.530 user@1400000098.service - - :1.541 3083 gsd-printer hitori :1.541 user@1400000098.service - - :1.548 3032 abrt-applet hitori :1.548 user@1400000098.service - - :1.55 1144 systemd hitori :1.55 user@1400000098.service - - :1.565 3035 gnome-software hitori :1.565 user@1400000098.service - - :1.571 3011 evolution-alarm hitori :1.571 user@1400000098.service - - :1.594 3194 abrt-dbus root :1.594 dbus-:1.8-org.freedesktop.problems@0.service - - :1.6 1 systemd root :1.6 init.scope - - :1.631 3284 boltd root :1.631 bolt.service - - :1.636 3298 slack hitori :1.636 user@1400000098.service - - :1.637 3298 slack hitori :1.637 user@1400000098.service - - :1.642 3426 chrome hitori :1.642 user@1400000098.service - - :1.643 3426 chrome hitori :1.643 user@1400000098.service - - :1.644 3482 gnome-terminal- hitori :1.644 user@1400000098.service - - :1.648 3426 chrome hitori :1.648 user@1400000098.service - - :1.69 1246 wpa_supplicant root :1.69 wpa_supplicant.service - - :1.7 998 abrtd root :1.7 abrtd.service - - :1.8 985 dbus-broker-lau root :1.8 dbus-broker.service - - :1.97496 205190 gdm-session-wor root :1.97496 session-3.scope 3 - com.intel.tss2.Tabrmd - - - (activatable) - - - com.redhat.NewPrinterNotification 3083 gsd-printer hitori :1.541 user@1400000098.service - - com.redhat.PrinterDriversInstaller 3083 gsd-printer hitori :1.541 user@1400000098.service - - com.redhat.ifcfgrh1 1111 NetworkManager root :1.27 NetworkManager.service - - com.redhat.oddjob 1127 oddjobd root :1.25 oddjobd.service - - com.redhat.oddjob_mkhomedir 1127 oddjobd root :1.28 oddjobd.service - - fi.w1.wpa_supplicant1 1246 wpa_supplicant root :1.69 wpa_supplicant.service - - net.reactivated.Fprint - - - (activatable) - - - org.bluez 950 bluetoothd root :1.0 bluetooth.service - - org.fedorahosted.certmonger - - - (activatable) - - - org.fedoraproject.FirewallD1 952 firewalld root :1.16 firewalld.service - - org.freedesktop.Accounts 1075 accounts-daemon root :1.12 accounts-daemon.service - - org.freedesktop.ColorManager 1139 colord colord :1.39 colord.service - - org.freedesktop.DBus 1 systemd root - init.scope - - org.freedesktop.Flatpak.SystemHelper - - - (activatable) - - - org.freedesktop.GeoClue2 - - - (activatable) - - - org.freedesktop.ModemManager1 947 ModemManager root :1.2 ModemManager.service - - org.freedesktop.NetworkManager 1111 NetworkManager root :1.17 NetworkManager.service - - org.freedesktop.PackageKit - - - (activatable) - - - org.freedesktop.PolicyKit1 1065 polkitd polkitd :1.14 polkit.service - - org.freedesktop.RealtimeKit1 958 rtkit-daemon root :1.1 rtkit-daemon.service - - org.freedesktop.UDisks2 964 udisksd root :1.4 udisks2.service - - org.freedesktop.UPower 1869 upowerd root :1.145 upower.service - - org.freedesktop.bolt 3284 boltd root :1.631 bolt.service - - org.freedesktop.fwupd - - - (activatable) - - - org.freedesktop.hostname1 - - - (activatable) - - - org.freedesktop.import1 - - - (activatable) - - - org.freedesktop.locale1 - - - (activatable) - - - org.freedesktop.login1 1095 systemd-logind root :1.15 systemd-logind.service - - org.freedesktop.machine1 963 systemd-machine root :1.5 systemd-machined.service - - org.freedesktop.network1 - - - (activatable) - - - org.freedesktop.nm_dispatcher - - - (activatable) - - - org.freedesktop.portable1 - - - (activatable) - - - org.freedesktop.problems 3194 abrt-dbus root :1.594 dbus-:1.8-org.freedesktop.problems@0.service - - org.freedesktop.problems.daemon 998 abrtd root :1.7 abrtd.service - - org.freedesktop.ratbag1 - - - (activatable) - - - org.freedesktop.resolve1 - - - (activatable) - - - org.freedesktop.systemd1 1 systemd root :1.6 init.scope - - org.freedesktop.timedate1 - - - (activatable) - - - org.freedesktop.timesync1 - - - (activatable) - - - org.gnome.DisplayManager 1141 gdm root :1.29 gdm.service - - org.gnome.GConf.Defaults - - - (activatable) - - - org.opensuse.CupsPkHelper.Mechanism - - - (activatable) - - - ```

brianjmurrell commented 4 years ago

My dbus-broker is currently at 1.3GiB/530.2MiB Virtual/Resident memory. That seems like a lot of memory for a broker. Is that expected?

brianjmurrell commented 4 years ago

My dbus-broker has now grown to 2.0/1.7GB Virtual/Resident memory and is the second biggest consumer of RAM on my system only just a hair smaller than evolution!

Gobbling up almost 2GB of RAM seems pretty hoggish and unreasonable for a broker.

Is anything being done about this?

As an aside, can I restart it on a running system, with desktop users logged in and it will regain it's state or will restarting it wreak havoc?

hit0ri commented 4 years ago

I've upgraded another laptop to Fedora 31 (from Fedora 29) and it has the same issue.

dvdhrm commented 4 years ago

The new dbus-broker-v22 release now contains a debugging mode for this. In case you see this issue on a machine with v22 or newer, please run this:

sudo dbus-send --system --dest=org.freedesktop.DBus --type=method_call --print-reply /org/freedesktop/DBus org.freedesktop.DBus.Debug.Stats.GetStats

This dumps the entire accounting-information of dbus-broker. This will allow us to debug the situation. Don't forget to dump the process-list as well:

ps auxf
dvdhrm commented 4 years ago

I am closing this, as there seems to be no progress on this. dbus-broker-22 provides debugging interfaces for this. If anyone can reproduce this, please see my previous comment for details on extracting the required information.

braveheuel commented 4 years ago

What would be the way to interprete the org.freedesktop.DBus.Debug.Stats.GetStats output in conjunctions with the process table? I could not find any information on that (or I didn't search correctly).

dvdhrm commented 4 years ago

What would be the way to interprete the org.freedesktop.DBus.Debug.Stats.GetStats output in conjunctions with the process table? I could not find any information on that (or I didn't search correctly).

The GetStats() function returns internal state of dbus-broker. There is no intention to use this data for anything but debugging. Hence, there is no documentation other than the code exposing it.

In other words: it is a much lighter version of a core-dump, only containing the data that we need. It should be treated with the same confidentiality as a core-dump.

vosian commented 3 years ago

I'm on version v24 and seeing the same issue, dbus-broker is currently consuming a bit over 270mb, I'm annexing the output of sudo dbus-send --system --dest=org.freedesktop.DBus --type=method_call --print-reply /org/freedesktop/DBus org.freedesktop.DBus.Debug.Stats.GetStats ps-log.txt dbus-log.txt

dvdhrm commented 3 years ago

Thanks for the data and report! Your process log says that the user instance is apparently at 270MiB, but your dbus query was sent to the system instance. Can you try this again, but using sudo dbus-send --user ... instead?

In ps you can see whether the dbus-broker instance is a system or user process by checking the --scope argument.

vosian commented 3 years ago

sudo dbus-send --user --dest=org.freedesktop.DBus --type=method_call --print-reply /org/freedesktop/DBus org.freedesktop.DBus.Debug.Stats.GetStats unfortunately this brings up "usage" instructions, so there's something wrong with this command, I don't really know much of DBUS so I don't know how to correct it. Unfortunately, I rebooted since my last message the log I could get right now wouldn't be useful as broker is taking ~3mb of ram at the moment

dvdhrm commented 3 years ago

sudo dbus-send --user --dest=org.freedesktop.DBus --type=method_call --print-reply /org/freedesktop/DBus org.freedesktop.DBus.Debug.Stats.GetStats unfortunately this brings up "usage" instructions, so there's something wrong with this command, I don't really know much of DBUS so I don't know how to correct it. Unfortunately, I rebooted since my last message the log I could get right now wouldn't be useful as broker is taking ~3mb of ram at the moment

Right, it is dbus-send --session rather than --user.

AdamWill commented 3 years ago

@dvdhrm sorry for the ping, but I'm seeing similar behaviour with openQA (an automated test system we use extensively in Fedora) causing tests to fail occasionally. I've filed an issue at https://progress.opensuse.org/issues/90872 with the dbus state dump and ps output attached, though not during the error state as it's not currently in it (I figured the state might still show the problem building up, though). If you have any tips on how we could further debug it, that'd be great :)

dvdhrm commented 3 years ago

@AdamWill Sure! Lets move discussion there!

dvdhrm commented 3 years ago

@AdamWill FYI, I summarized how we found the resource leak in openQA and fixed it. Maybe this serves as help to others facing similar problems and trying to make sense of it:

https://dvdhrm.github.io/2021/04/14/locating-dbus-resource-leaks/