armbian / build

Armbian Linux build framework generates custom Debian or Ubuntu image for x86, aarch64, riscv64 & armhf
https://www.armbian.com
GNU General Public License v2.0
3.97k stars 2.23k forks source link

Problem found in ...lib/debootstrap.sh #4081

Closed wwortel closed 1 year ago

wwortel commented 2 years ago

Describe the bug Partition of a gpt based SD card image used to function until a week ago. (fat boot partition and ext4 rootfs partition for Rock 3A board) but now all ends up in one partition. The image won't start the board.

To Reproduce normal compile

From the log: [ o.k. ] Creating blank image for rootfs [ 2592 MiB ] [ o.k. ] Creating partitions [ /boot: fat root: ext4 ] [ .... ] Creating rootfs [ ext4 on /dev/loop5p2 ] [ .... ] Creating /boot [ fat on /dev/loop5p1 ] mount: /home/vagrant/armbian/.tmp/mount-a20e1288-8135-47c8-8617-dd41183feca2/boot: wrong fs type, bad option, bad superblock on /dev/loop5p1, missing codepage or helper program, or other error.

So the attempt to mount the fat partition fails for some reason. Looking at the man page of parted noticed that the separator -- should follow the -s, and the command directly follow the filename. Changed that as experiment but it does not resolve the problem.

Added some lines to debootstrat.sh to see the exact commands issued when making the partitions and after me having moved the -- to follow -s : command: parted -s -- /home/vagrant/armbian/.tmp/rootfs-a20e1288-8135-47c8-8617-dd41183feca2.raw mkpart primary fat16 32768s 557055s command: parted -s -- /home/vagrant/armbian/.tmp/rootfs-a20e1288-8135-47c8-8617-dd41183feca2.raw mkpart primary ext4 557056s 100%

The second command seems also contrary to the parted man page. There is is stated that >>>part-type may be specified only with msdos and dvh partition tables<<< primary cannot be used with ext4. But in the full documentation in actual examples this is not followed. Tested this and leaving out primary in the ext4 creation causes an error thrown. So parted documentation is incorrect. Another issue relative to the parted documentation: for gpt partition tables names must be given when creating partitions. The current commands do not do that. Not clearly explained in the docs but found in some examples is that in case of a gpt partition table the given partition type is actually going to be the name of the partition. So both partitions are named primary in case of gpt.

iav commented 2 years ago

how can I reproduce this?

wwortel commented 2 years ago

how can I reproduce this?

compile the simplest image possible for a rock 3A board. Near completion, when the image file is being filled with boot and rootfs filesystem, the error occurs, but the compilation completes. But the image will not have any contents in boot. All of /boot is found as part of the ext4 rootfs. To experiment with this in a quick way have just made a little test script that mimicks the steps taken in the case of a gpt image. It is already clear that the mkpart commands need changing as both partitions currently get the name 'primary', which probably confuses the later steps.

igorpecovnik commented 2 years ago

What is your host type? Our CI infra is using Docker on top of Ubuntu 20 or 22.

wwortel commented 2 years ago

using Vagrant / Virtualbox with Ubuntu 22.04

wwortel commented 2 years ago

I think I found the origin of the problem. Have made a script on my host PC that does all the steps that lead to the error and can now quickly reproduce it. The problem is when mounting the boot partition because of mountopts[$bootfs] and mkopts[$bootfs] not having been defined because commented out ($bootfs being fat). This causes havoc.

hzyitc commented 2 years ago

Please upload the log file output/debug/install.log.

The script will output the log of mkfs.fat to this file:

lib/debootstrap.sh#L688:

if [[ -n $bootpart ]]; then
    display_alert "Creating /boot" "$bootfs on ${LOOP}p${bootpart}"
    check_loop_device "${LOOP}p${bootpart}"
    mkfs.${mkfs[$bootfs]} ${mkopts[$bootfs]} ${mkopts_label[$bootfs]:+${mkopts_label[$bootfs]}"$BOOT_FS_LABEL"} ${LOOP}p${bootpart} >> "${DEST}"/${LOG_SUBPATH}/install.log 2>&1
    mkdir -p $MOUNT/boot/
    mount ${LOOP}p${bootpart} $MOUNT/boot/
    echo "UUID=$(blkid -s UUID -o value ${LOOP}p${bootpart}) /boot ${mkfs[$bootfs]} defaults${mountopts[$bootfs]} 0 2" >> $SDCARD/etc/fstab
fi
hzyitc commented 2 years ago

Please check have you merge #4050 which show the same output.

amazingfate commented 2 years ago

https://github.com/amazingfate/armbian-rock3a-images/runs/7862135558?check_suite_focus=true#step:3:36038 Here is the log built 2 days ago. I didn't test your PR @hzyitc.

hzyitc commented 2 years ago

Well, it seem like that this bug will only appear with mkfs.fat 4.2 (2021-01-31) which the offical docker use.

When LABEL is longer than 11 characters, mkfs.fat 4.2 (2021-01-31) will fail while mkfs.fat 4.1 (2017-01-24) will cut it.

And the default value for boot partition is armbian_boot.

So we just need to set the argument BOOT_FS_LABEL to a shorter LABEL.

Here is the output of output/debug/install.log:

mkfs.vfat: Label can be no longer than 11 characters
mkfs.fat 4.2 (2021-01-31)
hzyitc commented 2 years ago

Confirmed in mkfs.fat source.

https://github.com/dosfstools/dosfstools/blob/v4.1/src/mkfs.fat.c#L1500:

    case 'n':       /* n : Volume name */
        sprintf(volume_name, "%-11.11s", optarg);
        for (i = 0; volume_name[i] && i < 11; i++)
        /* don't know if here should be more strict !uppercase(label[i]) */
        if (islower(volume_name[i])) {
            fprintf(stderr,
                    "mkfs.fat: warning - lowercase labels might not work properly with DOS or Windows\n");
            break;
        }

        break;

https://github.com/dosfstools/dosfstools/blob/v4.2/src/mkfs.fat.c#L706:

    len = mbstowcs(NULL, volume_name, 0);
    if (len != (size_t)-1 && len > 11)
    die("Label can be no longer than 11 characters");
The-going commented 2 years ago

GPT has the first short service partition which is starting from sector 2048 - 4095. It should not be mounted. Or am I wrong?

It is displayed as a fat file system, but it is not.

P.S.

leo@vm-jammy:~/armbian$ ls /dev/sd*
/dev/sda  /dev/sda1  /dev/sda2

leo@vm-jammy:~/armbian$ sudo fdisk /dev/sda
[sudo] password for leo:

Command (m for help): p

Disk /dev/sda: 35 GiB, 37580963840 bytes, 73400320 sectors
Disk model: QEMU HARDDISK   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 5C6E4615-36D7-4BA2-B79F-EA8487B36437

Device     Start      End  Sectors Size Type
/dev/sda1   2048     4095     2048   1M BIOS boot
/dev/sda2   4096 73398271 73394176  35G Linux filesystem

q

leo@vm-jammy:~/armbian$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           1.1G  840K  1.1G   1% /run
/dev/sda2        35G   20G   14G  60% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           5.8G  4.0K  5.8G   1% /tmp
/dev/sdb1        12G  8.8G  2.3G  80% /tools
tmpfs           788M  4.0K  788M   1% /run/user/1000
wwortel commented 2 years ago

Now get armbian to complete the build without an error message. But the resulting image is broken. When I analyze images, using gparted, earlier functioning images were shown with two partitions with correct fs type 1: fat16 and 2: ext4. Current images however show 'unknown' for the fs type of partition 1. Indeed earlier on both partitions got the label 'primary'. Wrong but not fatal. Remains the question what is causing this problem with what should be a fat6 partition but results in something that is not recognized as such. Using my test routine, but now inside the virtualbox, but building an empty test image in the linked area on the host, noticed that what is missing, in case of a gpt type partition, is also giving the filesystem type as third argument to mkfs. so what mkfs should get, in case of a gpt partition system, is options, label, fstype, and device. In any case a test is needed when making the fat16 boot partition whether we deal with gpt or not. The command sequence for mkfs needs to be different. Currently testing....

wwortel commented 2 years ago

Some progress. Now the scripts produce a proper image that opens with an image mounter and looks OK in gparted. (two partitions, fat16 and ext4, with names as given by me, boot and rootfs). However, what should have ended up in the boot partition's root, ends up in the /boot folder of the rootfs partition. The fat16 boot partition's file system is completely empty. So more is wrong in debootstrap.sh. Stuff gets written to $SDCARD/boot but it is my current understanding, but I may be wrong, that $SDCARD.raw is the total image, $SDCARD the folder that gets all generated code, and $MOUNT a mounted filesystem from a partition inside $SDCARD.raw. However do not understand that not two different mount points are created, one for the file system on partition1 (boot / fat16) and one for the filesystem on partition2 (rootfs /ext4). There is just the one mount point $MOUNT with a sub-folder boot. So it seems logical that all ends up in the rootfs partition. Can someone shine some light on how this is supposed to function?

The-going commented 2 years ago

Can someone shine some light on how this is supposed to function?

First you need to find the commit in which the regression occurred. Then we'll see what we can do. It may be easier to reverse this commit and then fix it again.

The-going commented 2 years ago

Can someone shine some light on how this is supposed to function?

Logic is broken somewhere here. https://github.com/armbian/build/blob/120a27a41b56ad4da7da3e27d35602d095edc008/lib/debootstrap.sh#L649-L703

When there is more than one partition, for example GPT + UEFI + BIOS + FAT to boot + root, we will have three partitions, we need to mount the third partition first. It is the largest and will be the root. Then create the boot folder in the root partition and mount the second partition there.

Check the variables by adding debugging code:

diff --git a/lib/debootstrap.sh b/lib/debootstrap.sh
index 4822c1158..996505b35 100644
--- a/lib/debootstrap.sh
+++ b/lib/debootstrap.sh
@@ -653,6 +653,8 @@ PREPARE_IMAGE_SIZE

        partprobe $LOOP

+       show_checklist_variables rootpart bootpart uefipart
+       display_alert "See debug:" "${SRC}/output/${LOG_SUBPATH}/trash.log" "info"
        # stage: create fs, mount partitions, create fstab
        rm -f $SDCARD/etc/fstab
        if [[ -n $rootpart ]]; then
wwortel commented 2 years ago

Have succeeded making the build work again. In the process made many more cosmetic changes than necessary, as part of all experiments, but essential was differentiating between y/n gpt when making file systems. Exactly the same command line arguments have a different effect, depending on the earlier choice of partition table type. debootstrap-correct-gpt-partition-and-image-creation.patch.zip

The-going commented 2 years ago

In the process made many more cosmetic changes than necessary, as part of all experiments,

Any cosmetic syntax changes that do not change the functionality must be done as a separate commit.

but essential was differentiating between y/n gpt when making file systems. Exactly the same command line arguments have a different effect, depending on the earlier choice of partition table type

It should also be a separate commit or several commits with detailed comments and a link to the documentation.

You can show your changes here in the discussion of the question as follows:

"```diff
@@ -52,7 +52,7 @@ debootstrap_ng()
    fi
    [[ -n $FORCE_TMPFS_SIZE ]] && phymem=$FORCE_TMPFS_SIZE

-   [[ $use_tmpfs == yes ]] && mount -t tmpfs -o size=${phymem}M tmpfs $SDCARD
+   [[ $use_tmpfs == yes ]] && mount -t tmpfs -o size=${phymem}M tmpfs ${SDCARD}

    # stage: prepare basic rootfs: unpack cache or create from scratch
    create_rootfs_cache
```"

Without double quotes, it will display like this:

@@ -52,7 +52,7 @@ debootstrap_ng()
    fi
    [[ -n $FORCE_TMPFS_SIZE ]] && phymem=$FORCE_TMPFS_SIZE

-   [[ $use_tmpfs == yes ]] && mount -t tmpfs -o size=${phymem}M tmpfs $SDCARD
+   [[ $use_tmpfs == yes ]] && mount -t tmpfs -o size=${phymem}M tmpfs ${SDCARD}

    # stage: prepare basic rootfs: unpack cache or create from scratch
    create_rootfs_cache

A quick look tells me that you have found something important, but I am not sure that I have seen all the important changes. And I need time to separate the non-essential changes.

And immediately a question. Are you ready to make a pull request? Are you ready to contribute to the development in the future?

wwortel commented 2 years ago

well, i dove into this because i got blocked by this bug. But I can make a pull request by forking the repo and modifying the fork and making the request. Some of the cosmetics I introduced because of the many composite lines, with partly variables for parameter expansion and partly plain text, that are in this code. What is what does not stand out. Curly brackets showing the start and end of variables just make it easier for humans and the interpreter. The other thing I noticed is the use of the variable SDCARD in the entire build system, where in actual fact it is just the root of a tree that holds all pieces of software that have to go some place in the partitions and file systems in those partitions. The only place where it is directly about the sdcard is where ${SDCARD}.raw gets introduced. Would suggest to call it WHOUSE for warehouse, or WSHOP for workshop, from and into which bits and pieces are picked and put. Also would suggest to rename MOUNT into MOUNTS, plural, as it is the root of a tree of mount points. All this does not change any functionality but makes the code easier to understand, at least to a newcomer like myself. It took me a while to figure out that SDCARD currently has dual use. Followed by /... it becomes a directory, followed by .raw, a file. But the directory has no 1:1 relation to the sdcard, where the file does. Will now try to make a pull request with just the minimum of adaptations needed to make debootsrap.sh do its thing again in case of a gpt partition table.

The-going commented 2 years ago

Please do not try to change the names of global variables. They can be used in the most mysterious places. For example, in some external user scripts or where you haven't looked yet.

If you import the repository directly from your account import-repo

and then clone your own repository already, github will automatically offer to make a pull request when you push a new branch with changes to it.

wwortel commented 2 years ago

See pull request: https://github.com/armbian/build/pull/4095/commits/42318ee59955d1abb2f8062d601774393fa99fa4

hzyitc commented 2 years ago

May I ask what the behavior of this bug now?

I try to build rock-3a. And it seek like everything is fine. The boot partition contain the boot file.

hzyitc commented 2 years ago

When there is more than one partition, for example GPT + UEFI + BIOS + FAT to boot + root, we will have three partitions

Well, our script don't support this case. We use if...elseif to determine partition configuration and partiton.

As you can see, in determine partition configuration, the BOOTFS_TYPE is the first case to check. But in partition, we will check UEFISIZE first. So if we set UEFISIZE, the script will fail because we don't have Partition 2 to mount (we only have Parition 1 and Parition 15). If we don't set UEFISIZE, the script can work well but the output image don't have UEFI partition.

I will try to refactor prepare_partitions to support more cases when I have time

https://github.com/armbian/build/blob/c0a9e7be348974e6892c21f646ba67b35d57fd8b/lib/debootstrap.sh#L495-L532

https://github.com/armbian/build/blob/c0a9e7be348974e6892c21f646ba67b35d57fd8b/lib/debootstrap.sh#L581-L633

cezar-carneiro commented 1 year ago

Sorry for kidnapping the issue, but I was just trying to generate an image for my Rock5 and I got a pretty similar error message. I run the build from the master branch inside a docker ubutu jammy container. This is the error msg:

[ error ] ERROR in function prepare_partitions [ functions/cli/cli-entrypoint.sh:108 -> functions/main/default-build.sh:125 -> functions/main/rootfs-image.sh:105 -> functions/image/partitioning.sh:211 -> functions/logging/traps.sh:0 ]
[ error ] Unable to find free loop device
[ o.k. ] Process terminated
[ error ] unmount_on_exit() called! [ functions/cli/cli-entrypoint.sh:108 -> functions/main/default-build.sh:125 -> functions/main/rootfs-image.sh:105 -> functions/image/partitioning.sh:211 -> functions/logging/traps.sh:1 -> functions/main/rootfs-image.sh:0 ]
[ o.k. ] Unmounting [ /home/cezar/build/.tmp/rootfs-90f61cd1-c4dd-4590-b061-533746fdc966/ ]
[ error ] ERROR in function unmount_on_exit [ functions/cli/cli-entrypoint.sh:108 -> functions/main/default-build.sh:125 -> functions/main/rootfs-image.sh:105 -> functions/image/partitioning.sh:211 -> functions/logging/traps.sh:1 -> functions/main/rootfs-image.sh:26 -> functions/logging/traps.sh:0 ]
[ error ] debootstrap-ng was interrupted
[ o.k. ] Process terminated
igorpecovnik commented 1 year ago

This is the error msg:

This tells nothing. We need logs. If you are making images with Docker, you need to enable privilege mode within docker-config.

cezar-carneiro commented 1 year ago

sorry about the logs, but in the end i managed to do what i wanted to do, which was generating armbian images on my apple M1 via docker... i had to copy several other flags from your config-docker.conf (not only --privileged) and apply to a basic jammy image. thanks anyways!