batocera-linux / batocera.linux

batocera.linux
https://batocera.org
Other
2.03k stars 522 forks source link

[V39] SATA devices on Alder-Lake S no longer recognized (B660) #11182

Open taleteller opened 8 months ago

taleteller commented 8 months ago

Batocera build version

39

Your architecture

X86

Your Graphic Processor Unit(s) (GPU)

Nvidia, does not matter

Issue description

Today I attempted an Upgrade to check if the full broken status of DBus/Steam has been addressed (Spoiler: nope) and ended up with a broken system. After the Upgrade V38->V39 the system would not boot anymore stuck with a black screen.

Detailed reproduction steps

I use an B660 mainboard together with an 12th gen CPU and a SATA SSD. Just upgrading to V39 was enough to have a lot of "fun" for the whole day.

Details of any attempts to fix this yourself

My first assumption was that something in the upgrade processes went wrong, or it being some secure boot issue. Neither is true. After getting into the verbose boot start i got these lines repeating on the consoles

[timestamp] LABEL=BATOCERA: Can't lookup bootdev
[timestamp] LABEL=BATOCERA: Can't lookup bootdev
[timestamp] LABEL=BATOCERA: Can't lookup bootdev
[timestamp] LABEL=BATOCERA: Can't lookup bootdev
[timestamp] LABEL=BATOCERA: Can't lookup bootdev
[timestamp] LABEL=BATOCERA: Can't lookup bootdev
[timestamp] LABEL=BATOCERA: Can't lookup bootdev
mount: mounting LABEL=BATOCERA on /boot_root failed: No such file or directory
Waiting for the root device

Therefore early boot was working but after that the device went missing. I attempted a reinstallation from a stick but oddly the SSD would not show up. Clearing it was of no use. Therefore I wrote the image file directly to the SSD by adapter. Then I got the exact same problem as above. More crosschecking revealed, the system no longer recognizes any internal SATA devices. (NVMe I have not checked). On V38 it works fine, V39 no longer. Booting V39 via USB works fine, but thats no solution.

I ended up downgrading to V38

Details of any modifications you have made to Batocera.

N/A

Logs and data

Oddly enough lspci and dmsg came up IDENTICAL on v38 and v39. However on V39 no internal SATA device appears in /dev or anywhere else.

00:17.0 SATA controller: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] (rev 11) (prog-if 01 [AHCI 1.0])
        DeviceName: Onboard - SATA
        Subsystem: Biostar Microtech Int'l Corp Device 5225
        Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 123
        Memory at 81200000 (32-bit, non-prefetchable) [size=8K]
        Memory at 81203000 (32-bit, non-prefetchable) [size=256]
        I/O ports at 5050 [size=8]
        I/O ports at 5040 [size=4]
        I/O ports at 5020 [size=32]
        Memory at 81202000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [70] Power Management version 3
        Capabilities: [a8] SATA HBA v1.0
        Kernel driver in use: ahci
[    0.357479] ahci 0000:00:17.0: version 3.0
[    0.367794] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 4 ports 6 Gbps 0xf0 impl SATA mode
[    0.367804] ahci 0000:00:17.0: flags: 64bit ncq sntf led clo only pio slum part ems deso sadm sds 
[    0.401060] scsi host0: ahci
[    0.401635] scsi host1: ahci
[    0.402063] scsi host2: ahci
[    0.402436] scsi host3: ahci
[    0.402776] scsi host4: ahci
[    0.403015] scsi host5: ahci
[    0.403220] scsi host6: ahci
[    0.403441] scsi host7: ahci
dmanlfc commented 8 months ago

The issue template outlines how to provide the support file - https://wiki.batocera.org/troubleshooting#create_a_batocera_support_file

taleteller commented 8 months ago

Sure, if it helps: batocera-support-20240312144831.tar.gz

This is from a USB booted V39, it should contain an internal SATA device, like /dev/sdb but nope. Smells like missing kernel flags to me. Do you want a V38 as well?

dmanlfc commented 8 months ago

the kernel module configuration for sata hasn't changed. you may have stumbled upon a kernel regression for your controller. we have move to 6.8 now, did you want to try an early v40 image?

dmanlfc commented 8 months ago

@taleteller v40 early tests which has the updated kernel - https://drive.google.com/drive/folders/1_bqmR7CoZ78i7DolYt5b-RRqB5c-LPyN?usp=drive_link

taleteller commented 8 months ago

Sadly still no internal devices, support file from this image attached batocera-support-20240315110619.tar.gz

dmanlfc commented 8 months ago

@taleteller create a manjaro USB & boot off that. set the kernel to 6.7 or 6.8 then reboot. ensure you're on that kernel version (uname -r via the terminal will confirm) & then verify you can access your sata device. that's a good way to confirm the suspected kernel regression.

taleteller commented 8 months ago

This might take a little, because the latest manjora iso comes with 6.6 (internal drive working) and to get a reliable result I need to install it. The only drive spares I got around are hdds.

dmanlfc commented 8 months ago

only you can verify it unfortunately. other systems are fine. the alternative is to move away from sata for that board but if it's a regression, the kernel devs needs to know pronto.

taleteller commented 8 months ago

Yap, this is a kernel thing. Manjaro with 6.6.19-1 boots fine, booting with 6.7.7-1 and 6.8.0rc6-1 fail and drop me on an emergency shell without sda present in the dev tree.

By coincidence I found a similar unsolved case in the Manjaro Forums with also similar hardware: https://forum.manjaro.org/t/stuck-on-emergency-shell-with-kernel-6-7/157796

dmanlfc commented 8 months ago

ok good, thought as much. you will need to raise an issue with the linux kernel - https://www.kernel.org/doc/html/v4.19/admin-guide/reporting-bugs.html#:~:text=lists%20like%20LKML.-,Identify%20who%20to%20notify,via%20the%20subsystem%20mailing%20list.

taleteller commented 8 months ago

Well the kernel dev list is not really my waters. I can verify the problem for one Mainboard of one Vendor and would need to check mainline built kernels and I am not exactly the gentoo guy. Doubtful such an issue will get attention anytime soon. Meanwhile I verified that NVMe devices are working and therefore I will migrate to it.

HanM23 commented 5 months ago

FYI, there is a bug report on that issue

https://bugzilla.kernel.org/show_bug.cgi?id=218896

dmanlfc commented 5 months ago

try the build here: https://drive.google.com/drive/folders/1_bqmR7CoZ78i7DolYt5b-RRqB5c-LPyN?usp=drive_link