DieterReuter / image-builder-rpi64

Build SD card image for Raspberry Pi 3 64bit
MIT License
157 stars 37 forks source link

Partition Resize After Initial Boot Is Failing #89

Closed Magnitus- closed 5 years ago

Magnitus- commented 6 years ago

The partition resize on my Pis failed on the initial boot, leaving the disk space at ~1GB.

After some troubleshooting manually running the content of this file, I was able to isolate the problem: https://github.com/DieterReuter/image-builder-rpi64/blob/master/builder/files/etc/firstboot.d/10-resize-rootdisk

When running the "fdisk" command manually, I inputted the following without issues:

p
d
$PART_NUM
n
p
$PART_NUM
$PART_START

Then, I got the following input request (which is not handled by the script):

Partition #2 contains a ext4 signature.

Do you want to remove the signature? [Y]es/[N]o: y

The signature will be removed by a write command.

So in my case, the input needed to be (or 'n' as the case may be, removing the signature had no observable ill effect for me so far):

fdisk /dev/mmcblk0 <<EOF
p
d
$PART_NUM
n
p
$PART_NUM
$PART_START

y
p
w
EOF

Not sure why the difference was present. It was a local build of the Hypriot OS using the latest version of all the build repos for the Raspberry Pi 3 B+. It's not a blocker per say for me as I can fix it manually, but I'm unsure if others will encounter this issue as well.

Here is the whole input/output sequence:

root@icarus:/home/pirate# fdisk /dev/mmcblk0

Welcome to fdisk (util-linux 2.29.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): p
Disk /dev/mmcblk0: 14.9 GiB, 15931539456 bytes, 31116288 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device         Boot  Start     End Sectors  Size Id Type
/dev/mmcblk0p1        2048  133119  131072   64M  c W95 FAT32 (LBA)
/dev/mmcblk0p2      133120 2047998 1914879  935M 83 Linux

Command (m for help): d
Partition number (1,2, default 2): 2

Partition 2 has been deleted.

Command (m for help): n
Partition type
   p   primary (1 primary, 0 extended, 3 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (2-4, default 2): 2
First sector (133120-31116287, default 133120): 133120
Last sector, +sectors or +size{K,M,G,T,P} (133120-31116287, default 31116287): 

Created a new partition 2 of type 'Linux' and of size 14.8 GiB.
Partition #2 contains a ext4 signature.

Do you want to remove the signature? [Y]es/[N]o: y

The signature will be removed by a write command.

Command (m for help): p
Disk /dev/mmcblk0: 14.9 GiB, 15931539456 bytes, 31116288 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device         Boot  Start      End  Sectors  Size Id Type
/dev/mmcblk0p1        2048   133119   131072   64M  c W95 FAT32 (LBA)
/dev/mmcblk0p2      133120 31116287 30983168 14.8G 83 Linux

Command (m for help): w
The partition table has been altered.
neta540 commented 6 years ago

On newer kernels there is a problem running this resize script. The way the resize should be done is different, and will require the resize to be done in 2 phases. The first phase should be be: changing partition table, then a reboot. The second phase should simple be a call to: resize2fs /dev/mmcblk0p2 The way I personally solved it, and it appears it is done in other places as well, is modifying the init script by overriding it to launch the first phase script, then this first phase script will remove itself from the init so the next reboot will not run it and will run our normal system instead, with the 2nd phase script. 2nd phase script will launch as 10-resize-rootdisk runs today with firstboot.d, but its contents should be changed.

Here is a solution that works for me. Inside image-builder-rpi64/builder/chroot-script.sh, append:

mv /sbin/init /sbin/init.org
ln -s /sbin/resizefs /sbin/init

Then, creating this partition table change file in the rootfs in /sbin/resizefs with the content. (Matching location in image-builder-rpi64 repo would be image-builder-rpi64/builder/files/sbin/resizefs)

#!/bin/sh

mount -t proc proc /proc
mount -t sysfs sys /sys
mount -t tmpfs tmp /run
mount / -o remount,rw

PART_START=$(cat /sys/block/mmcblk0/mmcblk0p2/start)

echo "Resizing root filesystem"
fdisk -u /dev/mmcblk0 <<EOF
p
d
2
n
p
2
$PART_START

p
w
EOF
rm /sbin/init /sbin/resizefs
mv /sbin/init.org /sbin/init
echo "Done resizing, will now reboot"
echo 1 > /proc/sys/kernel/sysrq
mount / -o remount,ro
sync
echo b > /proc/sysrq-trigger

This is a very simple, shortened version of the resize script, without the dependency of parted. It causes the partition table to change and triggers a reboot. Before it reboots, it resets the /sbin/init to launch the original /sbin/init which is our original Linux system init. Then, our second phase script (10-resize-rootdisk) should only contain:

#!/bin/sh

resize2fs /dev/mmcblk0p2

I will completely support a PR with this method of resizing.

neta540 commented 6 years ago

For me resizing fails because partprobe do not notify the kernel of the changes once the partition table is re-written, causing resize2fs not to run. Looking at the following file, from the other repository (not aarch64), we can see the 2-stage resize in action.

https://github.com/hypriot/image-builder-rpi/blob/246c1012f52c22358230c81b678a2ed451427037/builder/chroot-script.sh#L154

We can see an overridden init= written in cmdline.txt to launch resizing from raspi-config which is not included in our repository, so the resizing mechanism is different. When init_resize.sh lauches, it removes itself from the init in cmdline.txt

https://github.com/urho3d/rpi-sysroot/blob/5b0cfd721b746e9ee885da3c79124486bb16347c/usr/lib/raspi-config/init_resize.sh#L174

I am not completely sure this is the case in this issue, but since I encountered a similar issue when running my private Alpine image with a resize script, two phase resize was required because partprobe method did not work as expected.

Magnitus- commented 6 years ago

Looks like a much better solution to me. Many thanks for sharing your expertise. I see you opened up an untested MR for this. I won't be available until Monday to test it, but if it is not resolved by then, I can try building Hypriot from the ground up using your fork and let you know how it goes.

Magnitus- commented 6 years ago

I was finally able to take a moment to take a deeper look at your changes.

First of all, thank you, I learned a few things while researching some of the things you implemented for your solution.

For the cause, I'm not sure if the change in expected user input to fdisk when I ran it manually was a side-effect of resize2fs not running properly the first time. It may very well be.

Looking at the proposed solution, there are only 2 details that nag me a little:

I'm wondering if such logic could be implemented idem-potently elsewhere, maybe in cloudinit (still new to that, a colleague recently introduced me to it, not sure if system-level functionality belong there though) or otherwise in an additional init script (I know I dabbled with such things back in ~2010, this article looks about right, but I'm not sure if it is current: https://wiki.debian.org/BootProcess)

Anyways, my 2 cents.

I'll test out your changes, but I'll also see if I can come up with something that doesn't disrupt (too much) the original booting behavior of the system.

neta540 commented 6 years ago

Yes, I agree there should be more checks in the purposed resize scripts, they were a quick n dirty versions of the tasks that should be done.

I didn't like overriding init= in cmdline.txt, in order to keep this option available for future use, instead of having it to run the resize script.

I'm wondering if such logic could be implemented idem-potently elsewhere, maybe in cloudinit (still new to that, a colleague recently introduced me to it, not sure if system-level functionality belong there though) or otherwise in an additional init script (I know I dabbled with such things back in ~2010, this article looks about right, but I'm not sure if it is current: https://wiki.debian.org/BootProcess)

I don't think it would be wise now to use cloud init or an additional init script to do the first phase task (re-partitioning), because in the time the init script will run, there will be other services running in the background, especially when thinking of having it done with cloud-init. Calling a reboot while there are other services running doesn't sound like a good option to me, it is less reliable and god knows the purpose of the user-created custom could-init configuration. I wouldn't be comfortable to reboot a running system just like that. The re-partitioning is a system script that should definitely override the init system. I don't see much that could go wrong as long as the script is minimal and to the point.

Magnitus- commented 6 years ago

Re-thinking about it, I guess with a bash script that doesn't have the -e flag, it's not so bad.

Usually, using bash without that flag can be annoying for its tendency to fail on a statement and not halt (resulting in silent failures), but in this case, it's probably better that the script reaches there point where it sets back the original init logic, whether or not the partition change succeeded.

For the reboot, I think you are probably correct in that running it so close in the initialization phase of services would be less than ideal (not to mention the resize operation on the system's partition).

DieterReuter commented 5 years ago

Fixed with #90