bus1 / dbus-broker

Linux D-Bus Message Broker
https://github.com/bus1/dbus-broker/wiki
Apache License 2.0
685 stars 84 forks source link

RFE: add destination/unit name to error messages #333

Open jakedane opened 10 months ago

jakedane commented 10 months ago

I'm using dbus-broker 35 on Arch Linux. I think it will be useful for troubleshooting if error messages include the destination/unit name the error is for. Currently dbus-broker does not include that.

For example if a destination is not found (https://github.com/bus1/dbus-broker/blob/main/src/bus/driver.c#L2794) dbus-broker has a Destination does not exist error message that does not include what the destination was. It's not logged elsewhere in the journal either. When using dbus-daemon the error message does include what the destination was:

# dbus-broker
gnome-shell[1216]: Failed to make thread 'KMS thread' realtime scheduled: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: Destination does not exist

# dbus-daemon
gnome-shell[1229]: Failed to make thread 'KMS thread' realtime scheduled: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: Name "org.freedesktop.RealtimeKit1" does not exist

Or for example when an activation fails because a unit is unknown (https://github.com/bus1/dbus-broker/blob/main/src/launch/service.c#L367) dbus-broker has a activation request failed: unknown unit error message that does not include the unit name. It is logged elsewhere in the journal (2 seconds earlier in my example) but there's no way to relate the journal entries that I can see. When using dbus-daemon the error message does include what the unit name was, compare the pulseaudio line for dbus-broker and dbus-daemon:

# dbus-broker
dbus-broker-launch[1108]: Activation request for 'org.bluez' failed: The systemd unit 'dbus-org.bluez.service' could not be found.
[+2s]
pulseaudio[2354]: GetManagedObjects() failed: org.freedesktop.DBus.Error.NameHasNoOwner: Could not activate remote peer: activation request failed: unknown unit.

# dbus-daemon
dbus-daemon[1126]: [system] Activating via systemd: service name='org.bluez' unit='dbus-org.bluez.service' requested by ':1.91' (uid=1000 pid=2369 comm="/usr/bin/pulseaudio --daemonize=no --log-target=jo")
dbus-daemon[1126]: [system] Activation via systemd failed for unit 'dbus-org.bluez.service': Unit dbus-org.bluez.service not found.
pulseaudio[2369]: GetManagedObjects() failed: org.freedesktop.systemd1.NoSuchUnit: Unit dbus-org.bluez.service not found.

If the error message sent back would include the destination/unit name similar to how dbus-daemon does that, I think that helps users trying to troubleshoot an issue on their system.

dvdhrm commented 10 months ago

We generally try to avoid ASCII error messages on the wire in dbus-broker. We try to send error-codes that are as specific as possible, and we ensure that the caller has all required information to print suitable log messages. There are many reasons for our design choice:

Our recommendation is for clients to print suitable error messages when transactions fail, rather than just printing the D-Bus error of the first failing message. D-Bus does not consider failing transactions as a fatal error, but can deal gracefully with it. So it is the client where the error originates, and thus it knows best what information to include in debugging information.

Whenever an actual error originates in dbus-broker, you will see a proper journal entry with additional structured information (see journalctl -b --output=verbose for examples).

We do recognize that error handling is often quite lacking in clients and we have included more information in the error-messages we send. We are not necessarily opposed to extending them, yet we place little urgency on extending the errors given the rationale explained above. I really really would prefer for clients to put more effort into error-handling, rather than dumping random D-Bus transactions onto the unsuspecting user.

jakedane commented 10 months ago

Thank you for the detailed response. I understand the reasons for the design choice and yes it would be nicer if clients could have more informative log messages. Using dbus-broker instead of dbus-daemon makes more apparent error handling in clients can be lacking.

The log messages (above and others) I found aren't a problem in my case probably because I deliberately didn't install some services. But still. After a system upgrade I typically take a quick look through the journal for any new errors or warnings. After the change to dbus-broker this week I find it harder to relate errors and warnings to things I already know about.

If the destination/unit name is known at the point in the code where the error message is made, could it be an idea to append that to those messages? Like this:

# for 1st example change from:
org.freedesktop.DBus.Error.NameHasNoOwner: Destination does not exist
# to:
org.freedesktop.DBus.Error.NameHasNoOwner: Destination does not exist: org.freedesktop.RealtimeKit1

# for 2nd example change from:
org.freedesktop.DBus.Error.NameHasNoOwner: Could not activate remote peer: activation request failed: unknown unit
# to:
org.freedesktop.DBus.Error.NameHasNoOwner: Could not activate remote peer: activation request failed: unknown unit: dbus-org.bluez.service

The journalctl -b --output=verbose is somewhat useful as for the 2nd example that includes where in the pulseaudio code it happens which has a clue in the filename it is about bluez. For the 1st example it doesn't provide more context. I'll see if I can do something for those clients.

dvdhrm commented 10 months ago

To be honest, the unit-name is one of the instances where I would like to avoid it. When names are activated, dbus-broker triggers the activation in dbus-broker-launch via an API, which then triggers the activation in systemd via an API. None of these APIs transmit ASCII error-strings as part of the failed transaction, so we would have to funnel that through separately. Certainly doable, but would mean carrying this information along a chain just to allow dbus-broker to return it to the sender, which already knows the name of the target to activate.

Also note that missing unit-files are system mis-configurations and should thus already trigger an elaborate journal-entry from dbus-broker with information about this.

The much easier amendment would be to do missing systemd-unit for name "org.example.Foobar". This would not tell you the exact name that was configured in the configuration file, but would provide a visual indication that this is related to the same error-message dbus-broker printed shortly before?

jakedane commented 10 months ago

The much easier amendment would be to do missing systemd-unit for name "org.example.Foobar". This would not tell you the exact name that was configured in the configuration file, but would provide a visual indication that this is related to the same error-message dbus-broker printed shortly before?

Perhaps I'm misunderstanding but is this part of the code not doing that?

                if (strcmp(error->name, "org.freedesktop.systemd1.NoSuchUnit") == 0) {
                        if (!service->n_missing_unit++) {
                                log_append_here(&launcher->log, LOG_WARNING, 0, DBUS_BROKER_CATALOG_ACTIVATE_NO_UNIT);
                                log_append_bus_error(&launcher->log, error);
                                log_append_service(&launcher->log, service);

                                r = log_commitf(&launcher->log,
                                                "Activation request for '%s' failed: The systemd unit '%s' could not be found.\n",
                                                service->name,
                                                service->active_unit);
                                if (r)
                                        return error_fold(r);
                        }

                        name_error = CONTROLLER_NAME_ERROR_UNKNOWN_UNIT;

Except if I read that correctly it has logic to only log it once for each service, which would be why I saw a 2 second delay between the below messages I think (some other client had tried to activate org.bluez before thus it was not logged again when the pulseaudio client tried it 2 seconds later):

dbus-broker-launch[1108]: Activation request for 'org.bluez' failed: The systemd unit 'dbus-org.bluez.service' could not be found.
[+2s]
pulseaudio[2354]: GetManagedObjects() failed: org.freedesktop.DBus.Error.NameHasNoOwner: Could not activate remote peer: activation request failed: unknown unit.
dvdhrm commented 10 months ago

Yes, this code logs to the journal. However, it runs in the launcher and thus is not involved in replying to the activation message. Instead, it sends a message back to the broker, which then replies to all pending activation messages of that name.