[BUG] on second boot (after resize) UUID not found

heitbaum commented 1 year ago

Describe the bug

Unmountable STORAGE partition on “first boot+1”

I will need to retest to see if this is consistent, but does mirror/is similar to some of the reported forum posts.

How to reproduce

Steps to reproduce the behavior: On tx6 le12:master (probably others) after booting from a SD card

and dd if=le.img of=/dev/mmcblk2;sync;umount /var/media/*;reboot
the LE resize completes and reboots
Drops into the recovery shell complaining about can’t find UUID
dmesg shows ext4 p2 as dirty
Run fsck
Reboot

Information

LibreELEC Version: [e.g. 9.2.1] 12:master
Hardware Platform: [e.g. RPi3] H6-TX6

Log file

Context

mglae commented 1 year ago

Something logged in /flash/fs-resize.log?

heitbaum commented 1 year ago

tx6-8822cs:~ # more /flash/fs-resize.log 
2023-08-09T11:51:00+00:00
/dev/mmcblk2p2 /storage ext4 rw,noatime 0 0
DISK: /dev/mmcblk2  PART: /dev/mmcblk2p2
*** parted -s -f -m /dev/mmcblk2 resizepart 2 100% >>/flash/fs-resize.log 2>&1
*** e2fsck -f -p /dev/mmcblk2p2 >>/flash/fs-resize.log 2>&1
STORAGE: 12/8192 files (0.0% non-contiguous), 6970/32768 blocks
*** resize2fs /dev/mmcblk2p2 >>/flash/fs-resize.log 2>&1
resize2fs 1.47.0 (5-Feb-2023)
Resizing the filesystem on /dev/mmcblk2p2 to 59928576 (1k) blocks.
The filesystem on /dev/mmcblk2p2 is now 59928576 (1k) blocks long.

tx6-8822cs:~ #

Log is there

mglae commented 1 year ago

No errors logged, normal successful output.

I'm wondering too if this can be reproduced.

heitbaum commented 1 year ago

Marking #7287 as duplicate.

chewitt commented 1 year ago

This has been reported on several Amlogic boards where users have installed a test image from my share. I've also seen it myself when reinstalling an N2+ board the other day (but nothing in the resize log). I've initially failed to spot the issue and just cycled power again and on third boot it restarted fine:

N2PLUS:~ # cat /flash/fs-resize.log 
2023-09-27T09:31:38+00:00
/dev/mmcblk1p2 /storage ext4 rw,noatime 0 0
DISK: /dev/mmcblk1  PART: /dev/mmcblk1p2
*** parted -s -f -m /dev/mmcblk1 resizepart 2 100% >>/flash/fs-resize.log 2>&1
*** e2fsck -f -p /dev/mmcblk1p2 >>/flash/fs-resize.log 2>&1
STORAGE: 12/8192 files (0.0% non-contiguous), 6970/32768 blocks
*** resize2fs /dev/mmcblk1p2 >>/flash/fs-resize.log 2>&1
resize2fs 1.47.0 (5-Feb-2023)
Resizing the filesystem on /dev/mmcblk1p2 to 14626816 (1k) blocks.
The filesystem on /dev/mmcblk1p2 is now 14626816 (1k) blocks long.

mglae commented 11 months ago

Has anyone seen this issue when not using a SD card?

It may be helpful to see the dmesg of the first boot, e.g. from the debug shell:

mount -o remount,rw /flash
dmesg >/flash/first_boot.dmesg.log
mount -o remount,ro /flash

drk1900 commented 10 months ago

Same issue in rpi4, but only when 64gb microsd or bigger is used

chewitt commented 10 months ago

I've seen this on Amlogic boards with 16GB and 32GB cards so I don't think size is a factor

chewitt commented 10 months ago

NB: It's also shown up with first boot from eMMC so it's not something dependent on SD media

mglae commented 10 months ago

@chewitt SD-Card is "an improvement of MMC 2.11". Likely EMMC and SD-Card are sharing the interface driver.

chewitt commented 10 months ago

Correct, though most ARM SoC devices have their own mmc implementations (built on common kernel mmc code) so the issue is either in common kernel code or more likely (as the issue seems to be LE specific) in something higher level.

I'm wondering if the issue @HiassofT reported earlier https://github.com/LibreELEC/LibreELEC.tv/issues/8474 also happens with ext4 causing a dirty filesystem that can't mount; hence SYSTEM in that filesystem cannot be found?

mglae commented 10 months ago

I'm wondering if the issue @HiassofT reported earlier #8474 also happens with ext4 causing a dirty filesystem that can't mount; hence SYSTEM in that filesystem cannot be found?

That is a mount helper issue, see my comment there. busybox's mount helper support is disabled in init.

Hetsh commented 9 months ago

For anyone in search of a workaround: flash the microSD with LibreELEC 11.0.3 and then let LibreELEC update to the latest version. I experienced this issue when flashing LibreELEC 11.0.4 to an 8GB microSD for my Pi4.

HiassofT commented 9 months ago

I could finally reproduce this on an RPi4 using a 64GB Sandisk Extreme Pro SD card. Here's the full serial console log: rpi4-boot.txt

Mount seems to fail because of an orphan inode checksum problem

[    3.974582] EXT4-fs error (device mmcblk0p2): ext4_init_orphan_info:617: comm mount: orphan file block 2: bad checksum

and when running fsck -v -f from the initial debug shell it also flags that issue:

# fsck.ext4 -v -f /dev/mmcblk0p2
e2fsck 1.47.0 (5-Feb-2023)
STORAGE: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Orphan file (inode 12) block 2 is not clean.
Clear<y>? yes

STORAGE: ***** FILE SYSTEM WAS MODIFIED *****

          12 inodes used (0.00%, out of 15329280)
           1 non-contiguous file (8.3%)
           1 non-contiguous directory (8.3%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 4
     3860344 blocks used (6.30%, out of 61315072)
           0 bad blocks
           0 large files

           0 regular files
           2 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
           2 files

HiassofT commented 9 months ago

It looks like resize2fs is doing something odd. I added a e2fsck -v -f call immediately after the resize and found this in fs-resize.log (and LE booted up fine)

2023-12-24T09:00:51+00:00
/dev/mmcblk0p2 /storage ext4 rw,noatime 0 0
DISK: /dev/mmcblk0  PART: /dev/mmcblk0p2
*** parted -s -f -m /dev/mmcblk0 resizepart 2 100% >>/flash/fs-resize.log 2>&1
*** e2fsck -f -p /dev/mmcblk0p2 >>/flash/fs-resize.log 2>&1
STORAGE: 12/8192 files (0.0% non-contiguous), 6970/32768 blocks
*** resize2fs /dev/mmcblk0p2 >>/flash/fs-resize.log 2>&1
resize2fs 1.47.0 (5-Feb-2023)
Resizing the filesystem on /dev/mmcblk0p2 to 61315072 (1k) blocks.
The filesystem on /dev/mmcblk0p2 is now 61315072 (1k) blocks long.

*** e2fsck -v -f -p /dev/mmcblk0p2 >>/flash/fs-resize.log 2>&1
STORAGE: Orphan file (inode 12) block 2 is not clean.
CLEARED.

          12 inodes used (0.00%, out of 15329280)
           1 non-contiguous file (8.3%)
           1 non-contiguous directory (8.3%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 4
     3860344 blocks used (6.30%, out of 61315072)
           0 bad blocks
           0 large files

           0 regular files
           2 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
           2 files

HiassofT commented 9 months ago

I did some more testing and could also reproduce it locally on my PC with current e2fsprogs master:

$ truncate -s 32MiB fs
$ ./misc/mke2fs -t ext4 -O orphan_file -m 0 fs
$ truncate -s 33GiB fs
$ ./resize/resize2fs fs
$ ./e2fsck/e2fsck -f fs
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Orphan file (inode 12) block 2 is not clean.
Clear<y>?

It indeed seems to be a bug in resize2fs (or e2fsprogs in general) which gets triggered when resizing an ext4 fs with the orphan_file option from 32MiB to more than 32GiB.

I found out that mke2fs in Debian Bookworm explicitly disables the metadata_csum_seed and orphan_file options (for compatiblity with older Debian systems), so couldn't initially reproduce it there, but when explicitly specifying -O orphan_file it shows the same issues.

I'll send a bugreport to linux-ext4 folks, in the meanwhile I guess it's easiest if we just run mkfs with -O ^orphan_file

HiassofT commented 9 months ago

upstream bugreport is here: https://lore.kernel.org/linux-ext4/ZafXawnqlO7OvG1k@camel3.lan/T/#u

LibreELEC / LibreELEC.tv