armbian / linux-rockchip

Improved Rockchip Linux
Other
99 stars 142 forks source link

6.1: Issue with `serial-tty-ttyS2.device`: Timed out waiting for device dev-ttyS2.device - /dev/ttyS2 #156

Closed ColorfulRhino closed 4 months ago

ColorfulRhino commented 4 months ago

Hello!

I'm having a problem with the new 6.1 vendor kernel. This issue has not been present in 5.10 legacy, which is why I'm posting this issue in this repo. Please let me know if it would better fit in the armbian/build repo. Device: FriendlyElec CM3588 NAS, running Armbian Bookworm (trunk)

For some reason, ttyS2 is trying to load but can't. This is holding up the first boot to Armbian unnecessarily for 90 seconds:

systemctl status dev-ttyS2.device
○ dev-ttyS2.device - /dev/ttyS2
     Loaded: loaded
     Active: inactive (dead)

Mar 13 16:30:29 nanopc-cm3588-nas systemd[1]: dev-ttyS2.device: Job dev-ttyS2.device/start timed out.
Mar 13 16:30:29 nanopc-cm3588-nas systemd[1]: Timed out waiting for device dev-ttyS2.device - /dev/ttyS2.
Mar 13 16:30:29 nanopc-cm3588-nas systemd[1]: dev-ttyS2.device: Job dev-ttyS2.device/start failed with result 'timeout'.

In comparison, ttyS6 is working fine:

systemctl status dev-ttyS6.device
● dev-ttyS6.device - /dev/ttyS6
    Follows: unit currently follows state of sys-devices-platform-feb90000.serial-tty-ttyS6.device
     Loaded: loaded
     Active: active (plugged) since Wed 2024-03-13 16:28:59 UTC; 4min 24s ago
     Device: /sys/devices/platform/feb90000.serial/tty/ttyS6

Booting a fresh image with the old legacy 5.10 kernel, dev-ttyS2.device is not trying to load and is therefore not causing any problems. It does also not show up when doing systemctl --type=device --all.


uart2 is defined in rk3588s.dtsi like this:

    uart2: serial@feb50000 {
        compatible = "rockchip,rk3588-uart", "snps,dw-apb-uart";
        reg = <0x0 0xfeb50000 0x0 0x100>;
        interrupts = <GIC_SPI 333 IRQ_TYPE_LEVEL_HIGH>;
        clocks = <&cru SCLK_UART2>, <&cru PCLK_UART2>;
        clock-names = "baudclk", "apb_pclk";
        reg-shift = <2>;
        reg-io-width = <4>;
        dmas = <&dmac0 10>, <&dmac0 11>;
        pinctrl-names = "default";
        pinctrl-0 = <&uart2m1_xfer>;
        status = "disabled";
    };

However, NanoPi devices do not have uart2. Instead, they use 0xfeb50000 for debug UART pins.

From rk3588s-nanopi-r6-common.dtsi

chosen: chosen {
        bootargs = "earlycon=uart8250,mmio32,0xfeb50000 console=ttyFIQ0 coherent_pool=1m irqchip.gicv3_pseudo_nmi=0";
    };

From rk3588-nanopc-cm3588-nas.dts (see &uart2 missing):

&uart0 {
    pinctrl-0 = <&uart0m0_xfer>;
    status = "disabled";
};

&uart3 {
    pinctrl-0 = <&uart3m1_xfer>;
    status = "disabled";
};
amazingfate commented 4 months ago

You can try to declare SERIALCON in board config: https://github.com/armbian/build/blob/main/lib/functions/rootfs/distro-agnostic.sh#L427-L430.

ColorfulRhino commented 4 months ago

You can try to declare SERIALCON in board config: https://github.com/armbian/build/blob/main/lib/functions/rootfs/distro-agnostic.sh#L427-L430.

Thank you! Adding the line SERIALCON="ttyS6:1500000" to the board config helped work around this issue. But I think this is still just a workaround, not actually solving the underlying issue. I'd like to understand what the actual issue is. Does anyone have an idea?

I'd like to help solving this problem for devices in the future, without a workaround.

Did anyone try systemctl status dev-ttyS2.device on another FriendlyElec rk3588 device with vendor 6.1 kernel? Internally, the CM3588 NAS is not that different than NanoPc-T6 or NanoPi-R6C for example. So this issue might affect those devices as well.

rpardini commented 4 months ago

Adding the line SERIALCON="ttyS6:1500000" to the board config helped work around this issue.

That's strange. I mean, is the debug console on ttyS2 or on ttyS6? Check which actually works with an UART dongle.

Instead, they use 0xfeb50000 for debug UART pins. From rk3588s-nanopi-r6-common.dtsi

You missed that there's a console=ttyFIQ0 in there. FIQ0 is related to debugging and (I guess?) is an alias for ttyS2 in vendor kernel... but searching for it reveals things.....

Spoiler alert, I guess...

reveal spoiler https://github.com/armbian/build/blob/main/config/sources/families/include/rockchip64_common.inc#L17 is missing the check for `vendor` branch

Thanks for looking into this, see you in a PR in armbian/build ;-)

ColorfulRhino commented 4 months ago

Okay so I did a lot of digging around, debugging, trying out stuff...

Adding the line SERIALCON="ttyS6:1500000" to the board config helped work around this issue.

That's strange. I mean, is the debug console on ttyS2 or on ttyS6? Check which actually works with an UART dongle.

Actually, it's neither. Sort of. The UART debug console on the 3 actual debug UART pins on the board are ttyFIQ0. Looking at the board schematic, ttyFIQ0 is bound to the uart2 pins. SERIALCON="ttyS6:1500000" solves the issue I was having, since it overrides the default. When I tried SERIALCON="ttyS2:1500000" instead, the boot process hung again. Apparently if this build option is missing, it auto enables ttyS2 (which is missing on the device). See this from an earlier build log without the SERIALCON option:

--> (190) INFO: Enabling serial console [ ttyS2 ]
--> (190) COMMAND: systemctl daemon-reload
   Running in chroot, ignoring command 'daemon-reload'
--> (190) COMMAND: systemctl --no-reload enable serial-getty@ttyS2.service
   Created symlink /etc/systemd/system/getty.target.wants/serial-getty@ttyS2.service → /lib/systemd/system/serial-getty@ttyS2.service.

But I wanted to figure out what was going on with this ttyS6. I plugged in my UART cables to uart6, looking at the GPIO layout seen in the board schematic. But I didnt have any output on my UART console. So I tried, and tried, and tried... The following output gave me some hints:

root@nanopc-cm3588-nas:~# dmesg | grep tty
[    2.263803] Kernel command line: root=UUID=dbb32531-2501-48de-874c-8f492a413e04 rootwait rootfstype=f2fs splash=verbose console=ttyFIQ0 console=tty1 consoleblank=0 loglevel=1 ubootpart=e0d98cb2-62e9-904e-aa68-da5b08eb216d usb-storage.quirks=0x2537:0x1066:u,0x2537:0x1068:u pcie_aspm.policy=powersave  cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1 earlycon=uart8250,mmio32,0xfeb50000 coherent_pool=1m irqchip.gicv3_pseudo_nmi=0
[    2.675293] printk: console [tty1] enabled
[    2.713083] Registered FIQ tty driver
[    3.285048] printk: console [ttyFIQ0] enabled
[    3.285185] Registered fiq debugger ttyFIQ0
[    3.701419] feb90000.serial: ttyS6 at MMIO 0xfeb90000 (irq = 46, base_baud = 1500000) is a 16550A
[    5.432087] systemd[1]: Created slice system-getty.slice - Slice /system/getty.
[    5.433783] systemd[1]: Created slice system-serial\x2dgetty.slice - Slice /system/serial-getty.
[    5.435704] systemd[1]: Expecting device dev-ttyFIQ0.device - /dev/ttyFIQ0...
[    5.435745] systemd[1]: Expecting device dev-ttyS6.device - /dev/ttyS6...
[    5.716293] systemd[1]: Found device dev-ttyFIQ0.device - /dev/ttyFIQ0.
[    5.718992] systemd[1]: Found device dev-ttyS6.device - /dev/ttyS6.
[    9.411611] ttyFIQ ttyFIQ0: tty_port_close_start: tty->count = 1 port count = 2

I wanted to figure out, how can I have printk: console [ttyFIQ0] enabled for ttyS6. Fiddling around with /boot/armbianEnv.txt led me to this file https://github.com/armbian/build/blob/7d38b4273ac6a3cb571cb63563f329e224a0caca/config/bootscripts/boot-rk3588-legacy.cmd#L6-L43

I has to add some hacks to finally get what I wanted (just console=ttyS6,1500000 doesn't work since it gets overwritten, see the file above). I added the following lines to armbianEnv.txt:

console=x
consoleargs=console=ttyFIQ0 console=ttyS6,1500000 console=tty1

YES!

root@nanopc-cm3588-nas:~# dmesg | grep tty
[    2.237123] Kernel command line: root=UUID=98bca3b3-50ed-4cb7-8630-cbe5d82d6e47 rootwait rootfstype=f2fs splash=verbose console=ttyFIQ0 console=ttyS6,1500000 console=tty1 consoleblank=0 loglevel=1 ubootpart=d0ba8d8b-a345-9947-90e7-2ba5fd72c720 usb-storage.quirks=0x2537:0x1066:u,0x2537:0x1068:u pcie_aspm.policy=powersave  cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1 earlycon=uart8250,mmio32,0xfeb50000 coherent_pool=1m irqchip.gicv3_pseudo_nmi=0
[    2.649040] printk: console [tty1] enabled
[    2.686999] Registered FIQ tty driver
[    3.262127] printk: console [ttyFIQ0] enabled
[    3.262265] Registered fiq debugger ttyFIQ0
[    3.665284] feb90000.serial: ttyS6 at MMIO 0xfeb90000 (irq = 45, base_baud = 1500000) is a 16550A
[    3.665623] printk: console [ttyS6] enabled

Now I have a serial console on the GPIO pins via ttyS6 and also the normal debug console on ttyFIQ0. And the system does not hang on boot. Nice!


SERIALCON is a bit confusing, If I remember, it sets up getty systemd service (why? some legacy?) but it doesn't change the bootscript at all, so the console= line passed to kernel is hardcoded.

Yeah. Now that I did all this digging around, I don't think this getty console stuff is doing anything at all, since ttyS6 did not work before I did the hacks in armbianEnv.txt mentioned above.

If it is indeed in ttyS2 (and not in ttyS6), try changing SERIALCON to ttySnever or such, which would prove that there's some crazy with the systemd service/userspace and this is not actually (or directly) kernel related.

Will test that now.

https://github.com/armbian/build/blob/main/config/sources/families/include/rockchip64_common.inc#L17 is missing the check for vendor branch

Thanks for finding this! This also confirms that rockchip kernel actually uses ttyFIQ0 instead of ttyS2. See also https://github.com/mfkiwl/rk-open-docs/blob/master/UART/Rockchip_Developer_Guide_UART_EN.md#driver-path-1

ColorfulRhino commented 4 months ago

With SERIALCON="ttySneveeeeeerrrr" in baord config, the boot is slow since Dependency failed for serial-getty@ttySneveeeeeerrrr.service (obviously since that device does not exist). But I can still access the UART console like normal. After doing this again:

console=x
consoleargs=console=ttyFIQ0 console=ttyS6,1500000 console=tty1

I can also access the console ttyS6 on GPIO.

I don't even have to set the baudrate like console=ttyFIQ0,1500000 since I assume that's already handled by earlycon=uart8250,mmio32,0xfeb50000?

amazingfate commented 4 months ago

nanopi r6 dts has included https://github.com/armbian/linux-rockchip/blob/rk-6.1-rkr1/arch%2Farm64%2Fboot%2Fdts%2Frockchip%2Frk3588-linux.dtsi, which defines bootarg, you may try to delete chosen node.

ColorfulRhino commented 4 months ago

nanopi r6 dts has included https://github.com/armbian/linux-rockchip/blob/rk-6.1-rkr1/arch%2Farm64%2Fboot%2Fdts%2Frockchip%2Frk3588-linux.dtsi, which defines bootarg, you may try to delete chosen node.

Great find! That seems to be the root source of ttyFIQ0. The driver fiq_debugger seems to expect some ttyFIQ though, so I'm not sure if deleting the ttyFIQ0 in bootarg would not cause other problems.

The obvious solution would be to simply add vendor to this https://github.com/armbian/build/blob/main/config/sources/families/include/rockchip64_common.inc#L17 Maybe there are other ways as well. I will try to think of what would be the best, non-hacky solution for the long run.