RPi-Distro / pi-gen

Tool used to create the official Raspberry Pi OS images
BSD 3-Clause "New" or "Revised" License
2.57k stars 1.61k forks source link

[BUG] Latest build with pi-gen hangs on boot sequence log #695

Closed Taha-Firoz closed 1 year ago

Taha-Firoz commented 1 year ago

I've tried several builds from pi-gen that's using the most recent kernel 6.1.21 (or which ever is in the repo) and it hangs indefinitely on the boot sequence screen, it was previously getting stuck on UTMP System Runlevel Change but after I removed all modifications and built just the pi-gen image it now gets stuck on Reached target Bluetooth. Hitting enter makes the blinking cursor blink faster. alt + f2 does switch to tty2 but nothing is responsive and no input from the keyboard is takes other than the cursors blinking speed changes.

I mean pi-gen really should lock down the kernel version to ensure that builds don't break. I've tried downloading and vendoring a kernel and bootloader deb package but the build script ignores the run.sh file of that extra step. If I could be guided on some work around I would appreciate that.

XECDesign commented 1 year ago

Last nightly image ran without issues for me, so I'm not seeing an issue on this end. If you have further information, I could try reproducing it again.

Taha-Firoz commented 1 year ago

I do want to provide more information but I have no idea how to, since the device never completes booting up or tty2 never becomes accessible.

Taha-Firoz commented 1 year ago

Okay so update, I made a new fresh build upto stage 4. For the lite image I get this on the first boot rpi-boot-failure

I choose ok and then the device reboots and gets stuck on this log. boot-log-stuck

My setup is a raspberry pi cm4-lite with a cm4-io board

For the full image, the one on stage4, I get the same boot failure error but when I press ok the device reboots, I see the rpi early boot logs, then systemd boot log for a fraction of a second and the Raspberry Pi OS splash screen and the device just turns off. Like my display loses it's signal and the green ACT Led on the cm4io board just stops blinking, the power one stays on though.

Taha-Firoz commented 1 year ago

Went insane debugging this problem, I mounted the sd card for the build and extracted the boot logs for the build that would hang. According to the logs fsck would fail on the build I generated with pi-gen. Turns out I was building the image on Ubuntu 23 with LVM which has newer packages, the final image for the rootfs that was built was not fsck-able (1.47.0) [has unsupported feature(s): FEATURE_C12] by the version of fsck on Ubuntu 22 and Bullseye 1.46 something. I produced a build from Ubuntu 22 and everything works now. I know Xenial is the only recommended version to build on, but this problem was way too unexpected.

XECDesign commented 1 year ago

Many thanks for digging into it. It sounds like this issue is going to be flagged up a lot more as 1.47 ends up on more installs, so it's worth fixing now.

We've had something similar with another feature before: https://github.com/RPi-Distro/pi-gen/commit/4d65b2b3579a6f808df4fce1ae3bc0ffea16c071

I'll add it to the todo list. Thanks again!

Taha-Firoz commented 1 year ago

Cant we just use something like toybox to maximize the version restrictions on the utilities pi-gens using. It should make everything a lot more convenient, specially given how toybox should be a drop in replacement completely compatible.

XECDesign commented 1 year ago

Toybox' mke2fs seems to be unfinished and I wouldn't trust something experimental over something that the rest of the world uses. What's an example of a problem we have had that toybox would have solved? The other tools offered by toybox aren't ones where it seems like the versions matter all that much.

If we wanted to lock the mke2fs step down to a specific set of filesystem features, then there are better ways to do it, but really, I'd rather keep moving forward with what everyone else is doing, even if it means things break sometimes.

Taha-Firoz commented 1 year ago

Maybe it was just me being heavily influenced by Langleys talk 😅 about toybox then. I do agree with you it's good to keep the wheel turning

XECDesign commented 1 year ago

FWIW, pi-gen's predecessor, Spindle, used Langley's Aboriginal Linux running in a VM along with qemu-nbd for disks. One of the reasons I went with pi-gen's approach was that I didn't like having so many layers if something goes wrong. Most of the time is was the qcow2 images getting corrupted or qemu-nbd hanging in strange ways. Instead of troubleshooting software I had no hope of beginning to understand, just removing those layers seemed simpler.

If I was starting again from scratch today, maybe I would have preserved some of those steps, but it seems like the world has moved on to Docker for this kind of thing.