death / dbus

A D-BUS client library for Common Lisp
BSD 2-Clause "Simplified" License
45 stars 29 forks source link

wait-for-incoming-message can hang on event-dispatch #4

Open jollm opened 12 years ago

jollm commented 12 years ago

Update: Here is a standalone test case:

(loop 
  (with-open-bus (bus (session-server-addresses))
    (with-introspected-object (notify bus "/org/freedesktop/Notifications" "org.freedesktop.Notifications")
      (notify "org.freedesktop.DBus.Introspectable" "Introspect")
      (notify "org.freedesktop.Notifications" "GetCapabilities")
      (notify "org.freedesktop.Notifications" "GetServerInformation"))
    (format t "."))
  (iolib.syscalls:usleep 500000))

This hung after about 10 minutes on my machine.

The thread hangs after an event-dispatch waiting for the reply to an "Introspect" call generated from with-introspected-object. Also, as you might expect, pending-messages for the connection does not contain a method-return-message, only a signal-message. The complete data for the method-return message is present in the socket's input buffer.

jollm commented 12 years ago

After a trip down the rabbit hole so to speak, I was at least able to confirm that the issue resides somewhere between epoll, iolib, and this library. The hang always occurs with an open file descriptor and a readable input buffer, always on introspect, and with all data present on the buffer (confirmed by reading out manually after interrupting the thread). For some reason the event never fires and the handler never runs. I tried several different variations using one-shot on the event base and timeouts, but always with the same result. After examining this thread, I am suspicious that this may be the same kernel epoll bug. As such, my local branch now provides an option to choose the multiplexer backend for the event base. After testing with the select multiplexer, if the issue does not recur, I will be more inclined to suspect kernel epoll.

jollm commented 12 years ago

The problem did reoccur after about 15 hours running with select. I am working on a test case for reproduction.