Vanilla-OS / core-image

Containerfile for the Vanilla OS Core image.
https://images.vanillaos.org/#/recipe/core
GNU General Public License v3.0
28 stars 13 forks source link

November 12, 2024 update does not boot. #91

Open canatella opened 1 week ago

canatella commented 1 week ago

Issue Description

I tried updating to the November 12, 2024 update. The update goes well, but when rebooting, it gets stuck at the splash screen. It's running on the Framework laptop 16, AMD version. Rolling back brings the system back up. I have no idea how to disable the splash screen in the grub menu to get some more information on what's going wrong there.

Steps to Reproduce

On what version of Vanilla OS this happens?

Vanilla OS 2 Orchid

Additional Information

No response

taukakao commented 1 week ago

Please run "abroot status" in the working partition and post the output here.

Also, you can press ESC before it locks up to see what happens.

canatella commented 1 week ago

Thanks! Here's the abroot status output:

ABRoot Partitions:
 • Present: vos-b
 • Future: vos-a ✓

Loaded Configuration: /etc/abroot/abroot.json

Device Specifications:
 • CPU: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
 • GPU: [Advanced Micro Devices, Inc. [AMD/ATI] Phoenix1 (rev c2)]
 • Memory: 31402 MB

ABImage:
 • Digest: sha256:62bfae4df97857f0da8597a54717aa486ce9eddd1756cf73fc9d8d1a12757874
 • Timestamp: 2024-11-08 18:23:31
 • Image: ghcr.io/vanilla-os/desktop:main

Kernel Arguments: quiet splash bgrt_disable $vt_handoff lsm=integrity resume=UUID=db1b9b53-fca1-4d6e-8efc-4a9298c76253

Packages:
 • Added: gnome-tweaks, pcscd, lm-sensors, iio-sensor-proxy, adb, libfuse2t64, direnv, power-profiles-daemon
 • Removed: tlp, tlp-rdw
 • Unstaged: 

Package agreement: true

Now I see the quiet in the kernel parameters, but I'm not sure how to update them on the updated partition... I'll try the ESC trick right away.

canatella commented 1 week ago

the ESC trick does not work. I checked the ESC key works ok when I'm in the grub menu. When I hit enter in grub to have it boot, then I keep quickly pressing ESC, the screen switch to black with an underscore in the top left. Then it still switches to the splash screen and then immediately freeze. I think I'll rollback, remove the quiet and splash parameter and update again. We'll see.

canatella commented 1 week ago

Alright, it seems the splash parameter itself is enough to trigger the crash. The problem is, it crashes almost immediately, so the boot does not appears in journalctl. I don't have access to any debug information.

Edit: to be clear, booting with

bgrt_disable $vt_handoff lsm=integrity resume=UUID=db1b9b53-fca1-4d6e-8efc-4a9298c76253

works okay. Booting with

splash bgrt_disable` $vt_handoff lsm=integrity resume=UUID=db1b9b53-fca1-4d6e-8efc-4a9298c76253

Freezes as soon as the splash screen shows up. It seems this would be more hardware/kernel related then anything to do with vanilla-os I guess.

taukakao commented 1 week ago

I'm guessing it has something to do with one of the layered packages. It would be best if you removed them (or adding them back in the case of tlp) and tried upgrading again.

If that works you can try adding them back and see what packaged caused the problem.

canatella commented 5 days ago

So, I tried removing everything with abroot pkg remove then abroot pkg apply. When rebooting I got an error saying that it could not check the integrity of my system. I could not hit continue somehow, I tried C multiple time but it would always power off. So I booted on the previous partition. There, abrootstatus` had a blank package status:

Packages:
 • Added: 
 • Removed: 
 • Unstaged: 

Even though the packages are still installed. I tried enabling the splash screen again. It still crashed, but this time, without quiet and with splash, I could see that the last kernel message is

[    2.724993] [drm] add ip block number 7 <sdma_v6_0>
[    2.725561] [drm] add ip block number 8 <vcn_v4_0>
[    2.726119] [drm] add ip block number 9 <jpeg_v4_0>
[    2.726664] [drm] add ip block number 10 <mes_v11_0>
[    2.727215] amdgpu 0000:c1:00.0: amdgpu: Fetched VBIOS from VFCT
[    2.727766] amdgpu: ATOM BIOS: 113-PHXGENERIC-001

I've taken these message from booting without splash, because otherwise it goes to fast and I'm pretty sure that's the last message I can see. When booting without splash, the next kernel messages are:

[    2.730569] Console: switching to colour dummy device 80x25
[    2.730606] amdgpu 0000:c1:00.0: vgaarb: deactivate vga console
[    2.730612] amdgpu 0000:c1:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[    2.730671] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[    2.730696] amdgpu 0000:c1:00.0: amdgpu: VRAM: 512M 0x0000008000000000 - 0x000000801FFFFFFF (512M used)
[    2.730701] amdgpu 0000:c1:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[    2.730716] [drm] Detected VRAM RAM=512M, BAR=512M
[    2.730719] [drm] RAM width 64bits DDR5

Which makes me really think that it's crashing when the kernels tries to display the splash screen. I think the kernel as been updated in the november release and I really think this looks like a driver issue. Is there a way to try the new os release with the previous kernel version?

On the other front, it seems now the package status is messed up in abroot. Is there a way to do a reset and boot to a pristine image?

Thanks!

Thanks

taukakao commented 5 days ago

Instead of abroot pkg apply, please try abroot upgrade. It will do the same thing and is more reliable.

And the blank package status is to be expected, since all packages are cleared.

canatella commented 5 days ago

Unfortunately, the package are not removed:

dam@laptop:~/dm/config/vanillaos$ abroot status
ABRoot Partitions:
 • Present: vos-a ✓
 • Future: vos-b

Loaded Configuration: /etc/abroot/abroot.json

Device Specifications:
 • CPU: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
 • GPU: [Advanced Micro Devices, Inc. [AMD/ATI] Phoenix1 (rev c2)]
 • Memory: 31401 MB

ABImage:
 • Digest: sha256:b1e238371d1164952656cec398fe047a98a6302793004953c8769f544de1ca37
 • Timestamp: 2024-11-20 11:04:41
 • Image: ghcr.io/vanilla-os/desktop:main

Kernel Arguments: bgrt_disable $vt_handoff lsm=integrity resume=UUID=db1b9b53-fca1-4d6e-8efc-4a9298c76253

Packages:
 • Added: 
 • Removed: 
 • Unstaged: 

Package agreement: true
dam@laptop:~/dm/config/vanillaos$ host-shell
dam@laptop:~/dm/config/vanillaos$ direnv --version
2.32.1

I'll try with an upgrade

canatella commented 5 days ago

abroot upgrade did actually remove the packages, but still the problem stays. Any idea on how to test with the previous kernel ? Otherwise I'll just wait for the next update, as everything works besides the boot splash screen.

taukakao commented 5 days ago

Using some other kernel is sadly not possible.

Thank you for trying to debug this.