Closed daleknauss closed 3 years ago
Hm. That looks like Mir querying details of a udev device that has gone away. But I can't think of any relevant code that has changed. Perhaps unrelated changes have changed the timing and exposed a race?
@RAOF as part of the update, the device would go away and reappear so I think your assessment is correct. Please let me know if there are any other details I can provide that might help with this issue.
I concur with @RAOF: there are no recent changes in this code, so I suspect the change is unrelated to the 2.4 release.
The root cause of the crash would appear to be that the exception shown is not being handled. Unfortunately, the above log snippet doesn't indicate the context in which the "Udev device does not exist". Is there anything that might be interesting in the log immediately before or after this?
Here are the rest of the logs I could pull out starting from when the update happens. Those unbind and bind failures are fairly common even before this issue.
A few other thoughts: I've reverted to a snap version that was heavily tested and working fine before the latest mir-kisok snap was released. However, I haven't yet done the work to check if one of our Canonical maintained snaps has been updated in the same window. It's possible that a change to another snap might be the root cause. I will be exploring that further.
Jun 23 00:48:52 FMTRMRD2A000045 kernel: uvcvideo: Found UVC 1.10 device (3301:1000)
Jun 23 00:48:52 FMTRMRD2A000045 kernel: input: Device: Device as /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0/input/input20
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper bind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper bind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper bind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 kernel: usb 1-6: USB disconnect, device number 3
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper bind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper bind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper bind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-logind[1902]: Failed to open /dev/input/event8: No such device
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6732]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6732]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6732]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6 189:2' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6 189:2' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6 189:2' failed with exit code 1.
Jun 23 00:48:53 FMTRMRD2A000045 mir-kiosk.daemon[2132]: [2021-06-23 00:48:53.170991] < - debug - > mirserver: Handling Terminated from pid=2132
Jun 23 00:48:53 FMTRMRD2A000045 mir-kiosk.daemon[2132]: ERROR: /build/mir-iJR6Nn/mir-2.4.0/src/platform/udev/udev_wrapper.cpp(60): Throw in function {anonymous}::DeviceImpl::DeviceImpl(udev_device*)
Jun 23 00:48:53 FMTRMRD2A000045 mir-kiosk.daemon[2132]: Dynamic exception type: boost::wrapexcept<std::runtime_error>
Jun 23 00:48:53 FMTRMRD2A000045 mir-kiosk.daemon[2132]: std::exception::what: Udev device does not exist
Jun 23 00:48:53 FMTRMRD2A000045 systemd[1]: snap.mir-kiosk.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 23 00:48:53 FMTRMRD2A000045 systemd[1]: snap.mir-kiosk.daemon.service: Failed with result 'exit-code'.
Jun 23 00:48:54 FMTRMRD2A000045 kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
@daleknauss @RAOF came up with a likely fix that I've incorporated into a new "2.4.1~rc2-snap124" build on the beta
channel.
If you have got an environment where you can reproduce the problem as it happens for you, please give this version a try.
It's not yet available for all architectures, but it should be there soon. I'm reaching EOD so letting you know now.
@AlanGriffiths I'm still seeing that crash on 2.4.1~rc2-snap124 (7239):
2021-06-29T18:50:05Z mir-kiosk.daemon[7811]: [2021-06-29 18:50:05.313501] < -warning- > evdev-input: Attempted to remove unmanaged fd 43
2021-06-29T18:50:09Z mir-kiosk.daemon[7811]: [2021-06-29 18:50:09.377665] < - debug - > mirserver: Handling Terminated from pid=7811
2021-06-29T18:50:09Z mir-kiosk.daemon[7811]: ERROR: /build/mir-ES5QiO/mir-2.4.1~rc2-g1b8f979fcb/src/platform/udev/udev_wrapper.cpp(60): Throw in function {anonymous}::DeviceImpl::DeviceImpl(udev_device*)
2021-06-29T18:50:09Z mir-kiosk.daemon[7811]: Dynamic exception type: boost::wrapexcept<std::runtime_error>
2021-06-29T18:50:09Z mir-kiosk.daemon[7811]: std::exception::what: Udev device does not exist
2021-06-29T18:50:09Z systemd[1]: snap.mir-kiosk.daemon.service: Main process exited, code=exited, status=1/FAILURE
@AlanGriffiths I'm still seeing that crash on 2.4.1~rc2-snap124
Thanks. I'll take another look armed with that additional information. (I'm pretty sure we fixed a problem, a shame it wasn't the problem you're seeing.)
Oops: Overzealous automated close!
@daleknauss there's now an ~rc3
release with an additional fix.
I'm still failing to reproduce the actual problem you're seeing. Any hints you might have on how to reproduce would be welcome.
@AlanGriffiths 2.4.1~rc3-snap124 appears to have resolved the issue. Thank you for resolving this so quickly.
Roughly how long should we expect these fixes to make it onto stable?
@AlanGriffiths 2.4.1~rc3-snap124 appears to have resolved the issue.
Thanks @daleknauss, I need to complete some tests and then I'll tag for release. The time to stable then depends on schedules I don't control, but should be of the order of two weeks.
Meanwhile, I'd like to confirm our understanding of what went wrong. You should be getting some "warning" messages in the log (instead of a crash) with the text "Failed to handle UDev". Can you confirm that and share the log for a few lines before and after?
@daleknauss one further thought: were you aware of the "Call for testing"?
@daleknauss FYI I have reproduced. As a result, I found the log warning message leaves something to be desired (https://github.com/MirServer/mir/pull/2095), but fixing that can wait for the next full release.
Thanks Alan. Definitely aware of calls for testing but don't always catch them when they're happening. Thank you. I'll try to see if I can get any more details on our end.
Hello,
As part of our system, we apply firmware updates to a number of attached devices. Since around the time of the 2.4.0-snap124 snap release, we have started to see mir-kiosk crash when applying an update. Here are the relevant logs:
Any ideas?