NeonGeckoCom / neon_debos

Debos build files for Neon OS
Apache License 2.0
1 stars 8 forks source link

[BUG] Qt not detecting display on boot #22

Open NeonDaniel opened 1 year ago

NeonDaniel commented 1 year ago

Description

Occasionally, the GUI fails to load and even local tty sessions are not rendered on screen. fbi calls work normally and all fb and dri devices exist as expected

Steps to Reproduce

Relevant Code

The shutdown service does explicitly blank the screen; perhaps power on needs to explicitly power on the screen?

Other Notes

NeonDaniel commented 5 months ago

Observed permissions of /dev/dri and /dev/render* as a possible cause of GUI errors. Reminded of this on Matrix

In a recent failure, the devices look normal:

(venv) neon@neon:~$ ll /dev/dr*
total 0
drwxr-xr-x  3 root root        120 Jan 26 13:48 ./
drwxr-xr-x 17 root root       3980 Jan 26 13:48 ../
drwxr-xr-x  2 root root        100 Jan 26 13:48 by-path/
crw-rw----  1 root video  226,   0 Jan 26 13:48 card0
crw-rw----  1 root video  226,   1 Jan 26 13:48 card1
crw-rw----  1 root render 226, 128 Jan 26 13:48 renderD128
NeonDaniel commented 5 months ago

Example of gui-shell logs in a failure case. The leading and ending errors are seen every time the GUI fails to launch when the screen is properly initialized with /dev/dri populated.

Jan 26 13:48:44 neon ovos-shell[554]: Failed to move cursor on screen DSI1: -13
Jan 26 13:48:44 neon ovos-shell[554]: Failed to move cursor on screen DSI1: -13
Jan 26 13:48:46 neon ovos-shell[554]: kf.kirigami: The style does not provide a C++ Units implementation. QML Units implementations are no longer suppor>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/main.qml:334:9: QML FastBlur: Cannot anchor to an item that isn't a parent or sibling.
Jan 26 13:48:46 neon ovos-shell[554]: QMetaProperty::notifySignal: cannot find the NOTIFY signal usePTTClient in class GlobalSettings for property 'useP>
Jan 26 13:48:46 neon ovos-shell[554]: mycroft connection not open!
Jan 26 13:48:46 neon ovos-shell[554]: mycroft connection not open!
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/panel/quicksettings/MuteDelegate.qml:55:5: QML Connections: Implicitly defined onFoo properties in Connection>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/panel/quicksettings/VolumeSlider.qml:61:5: QML Connections: Implicitly defined onFoo properties in Connection>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/panel/quicksettings/BrightnessSlider.qml:37:5: QML Connections: Implicitly defined onFoo properties in Connec>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/panel/SlidingPanel.qml:47:9: QML Connections: Implicitly defined onFoo properties in Connections are deprecat>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/NotificationsSystem.qml:45:5: QML Connections: Implicitly defined onFoo properties in Connections are depreca>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/NotificationsSystem.qml:22:5: QML Connections: Implicitly defined onFoo properties in Connections are depreca>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/ListenerAnimation.qml:18:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecate>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/qml/SkillView.qml:63:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. U>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/osd/VolumeOSD.qml:40:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. U>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/StatusIndicator.qml:165:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/ServiceWatcher.qml:35:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. >
Jan 26 13:48:47 neon ovos-shell[554]: error activating kdeconnectd: QDBusError("org.freedesktop.DBus.Error.Disconnected", "Not connected to D-Bus server>
Jan 26 13:48:47 neon ovos-shell[554]: error activating kdeconnectd: QDBusError("org.freedesktop.DBus.Error.Disconnected", "Not connected to D-Bus server>
Jan 26 13:48:47 neon ovos-shell[554]: kdeconnect.interfaces: dbus interface not valid
Jan 26 13:48:47 neon ovos-shell[554]: file:///usr/lib/aarch64-linux-gnu/qt5/qml/QMLTermWidget/QMLTermScrollbar.qml:29:5: QML Connections: Implicitly def>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/main.qml:88:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this s>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/main.qml:72:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this s>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/main.qml:62:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this s>
Jan 26 13:48:47 neon ovos-shell[554]: Failed to commit atomic request (code=-13)
NeonDaniel commented 5 months ago

Another different error:

Mar 20 11:45:40 neon systemd[1]: Started gui-shell.service - Neon GUI.
Mar 20 11:45:41 neon ovos-shell[4137]: drmModeGetResources failed (Operation not supported)
Mar 20 11:45:41 neon ovos-shell[4137]: no screens available, assuming 24-bit color
Mar 20 11:45:41 neon ovos-shell[4137]: Cannot create window: no screens available
Mar 20 11:45:41 neon systemd[1]: gui-shell.service: Main process exited, code=killed, status=6/ABRT
Mar 20 11:45:41 neon systemd[1]: gui-shell.service: Failed with result 'signal'.
NeonDaniel commented 5 months ago

When the GUI service fails to launch, it appears that tty sessions also fail. ctrl+alt+F2 appears to change to a new session but there is no prompt (completely black screen). ctrl+alt+F1 resumes an active static screen.

Looking at systemd logs, typed input is being handled, just not rendered on-screen.

NeonDaniel commented 5 months ago
NeonDaniel commented 5 months ago

With debug set in cmdline.txt, the following are present in a WORKING boot dmesg output but not a broken one:

[   19.166018] (udev-worker)[282]: drm: Processing device (SEQNUM=1759, ACTION=add)
[   19.173827] (udev-worker)[281]: 8250: Processing device (SEQNUM=1760, ACTION=add)

This was also present in a subsequent broken boot

This may be related to scripts/init-bottom/udev in the initramfs

NeonDaniel commented 5 months ago

Working udev has additional:

S: disk/by-path/platform-fd500000.pcie-pci-0000:01:00.0-usb-0:1:1.0-scsi-0:0:0:0-part1

and working initramfs has additional:

brcm-pcie fd500000.pcie: clkreq control enabled

Broken udev has additional:

│ │   ├─gpio/gpio22
│ │   │ ┆ P: /devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio22
│ │   │ ┆ M: gpio22
│ │   │ ┆ R: 22
│ │   │ ┆ U: gpio
│ │   │ ┆ E: DEVPATH=/devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio22
│ │   │ ┆ E: SUBSYSTEM=gpio
│ │   ├─gpio/gpio23
│ │   │ ┆ P: /devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio23
│ │   │ ┆ M: gpio23
│ │   │ ┆ R: 23
│ │   │ ┆ U: gpio
│ │   │ ┆ E: DEVPATH=/devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio23
│ │   │ ┆ E: SUBSYSTEM=gpio
│ │   ├─gpio/gpio24
│ │   │ ┆ P: /devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio24
│ │   │ ┆ M: gpio24
│ │   │ ┆ R: 24
│ │   │ ┆ U: gpio
│ │   │ ┆ E: DEVPATH=/devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio24
│ │   │ ┆ E: SUBSYSTEM=gpio
│ │   └─gpio/gpio25
│ │     ┆ P: /devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio25
│ │     ┆ M: gpio25
│ │     ┆ R: 25
│ │     ┆ U: gpio
│ │     ┆ E: DEVPATH=/devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio25
│ │     ┆ E: SUBSYSTEM=gpio
NeonDaniel commented 5 months ago

Another different error:

Mar 20 11:45:40 neon systemd[1]: Started gui-shell.service - Neon GUI.
Mar 20 11:45:41 neon ovos-shell[4137]: drmModeGetResources failed (Operation not supported)
Mar 20 11:45:41 neon ovos-shell[4137]: no screens available, assuming 24-bit color
Mar 20 11:45:41 neon ovos-shell[4137]: Cannot create window: no screens available
Mar 20 11:45:41 neon systemd[1]: gui-shell.service: Main process exited, code=killed, status=6/ABRT
Mar 20 11:45:41 neon systemd[1]: gui-shell.service: Failed with result 'signal'.

This one appears to be because /dev/dri/card0 is not deterministic and will sometimes link platform-gpu-card instead of platform-fec00000.v3d.card. Will remove QT_QPA_EGLFS_KMS_CONFIG to allow auto-detection

NeonDaniel commented 5 months ago

Reported as also affecting HDMI displays on the forum

NeonDaniel commented 5 months ago

As suggested by Claude, I refactored service dependencies to start gui-shell after the modprobe@drm service completed but the issue persists. Comparing outputs from systemd-analyze plot there does not appear to be any pattern common to failed boots.

Anecdotally, systemd-binfmt.service appears to consistently take longer (~1s vs ~94ms) in the failure cases but does exit successfully in both cases