coreos / rpm-ostree

⚛📦 Hybrid image/package system with atomic upgrades and package layering
https://coreos.github.io/rpm-ostree
Other
877 stars 196 forks source link

rpm-ostree causes btrfs file system corruption due to auto updates handled by Gnome Software #4470

Closed MateusRodCosta closed 1 year ago

MateusRodCosta commented 1 year ago

Host system details

mateusrc@centauro 
----------------- 
OS: Fedora Linux 38.20230615.0 (Silverblue) x86_64 
Host: G5 5590 
Kernel: 6.3.7-200.fc38.x86_64 
Uptime: 20 hours, 37 mins 
Packages: 1783 (rpm), 80 (flatpak) 
Shell: bash 5.2.15 
Resolution: 1920x1080 
DE: GNOME 44.2 
WM: Mutter 
WM Theme: Adwaita 
Theme: Adwaita [GTK2/3] 
Icons: Adwaita [GTK2/3] 
Terminal: gnome-terminal 
CPU: Intel i7-9750H (12) @ 4.500GHz 
GPU: NVIDIA GeForce GTX 1660 Ti Mobile 
GPU: Intel CoffeeLake-H GT2 [UHD Graphics 630] 
Memory: 3421MiB / 15788MiB 

Provide the output of rpm-ostree status.

$ rpm-ostree status
State: idle
Deployments:
● fedora:fedora/38/x86_64/silverblue
                  Version: 38.20230615.0 (2023-06-15T00:45:43Z)
               BaseCommit: fdaaef67a06df571b2aa7ef63a898f7c29d138ec774ee34d708f3b0b40b5764f
             GPGSignature: Valid signature by 6A51BBABBA3D5467B6171221809A8D7CEB10B464
      RemovedBasePackages: firefox firefox-langpacks 114.0-1.fc38
                           nano-default-editor 7.2-2.fc38
          LayeredPackages: abrt-desktop acpica-tools akmod-nvidia distrobox
                           epson-inkjet-printer-escpr f3 fira-code-fonts flatpak-builder
                           gamescope gnome-tweaks htop intel-media-driver langpacks-en
                           langpacks-pt_BR libratbag-ratbagd libva-utils libvirt
                           mozilla-fira-mono-fonts mozilla-fira-sans-fonts neofetch neovim
                           nvme-cli rpmfusion-free-release rpmfusion-nonfree-release
                           steam-devices syncthing vim-default-editor vim-enhanced
                           virt-manager virt-viewer waydroid xorg-x11-drv-nvidia
                           xorg-x11-drv-nvidia-cuda xorg-x11-drv-nvidia-power

  fedora:fedora/38/x86_64/silverblue
                  Version: 38.20230611.0 (2023-06-11T00:51:07Z)
               BaseCommit: e04b66d8af02faa9b3e9deff6004ac07ba2df84c10abc4982950a9777c60fcb4
             GPGSignature: Valid signature by 6A51BBABBA3D5467B6171221809A8D7CEB10B464
      RemovedBasePackages: firefox firefox-langpacks 114.0-1.fc38
                           nano-default-editor 7.2-2.fc38
          LayeredPackages: abrt-desktop acpica-tools akmod-nvidia distrobox
                           epson-inkjet-printer-escpr f3 fira-code-fonts flatpak-builder
                           gamescope gnome-tweaks htop intel-media-driver langpacks-en
                           langpacks-pt_BR libratbag-ratbagd libva-utils libvirt
                           mozilla-fira-mono-fonts mozilla-fira-sans-fonts neofetch neovim
                           nvme-cli rpmfusion-free-release rpmfusion-nonfree-release
                           steam-devices syncthing vim-default-editor vim-enhanced
                           virt-manager virt-viewer waydroid xorg-x11-drv-nvidia
                           xorg-x11-drv-nvidia-cuda

Expected vs actual behavior

Apparently rpm-ostree auto update causes btrfs corruption sometimes, this is usually noticed because the filesystem turns read only close to the time that I notice the fan of the laptop starts running very fast. This usually happens because GNOME Software decided to trigger auto update after I resume the laptop from suspend.

Logs:

jun 15 14:24:24 centauro rpm-ostree[116031]: Forcibly closing transaction due to timeout
jun 15 14:24:24 centauro rpm-ostree[116031]: Loaded sysroot
jun 15 14:24:24 centauro rpm-ostree[116031]: Locked sysroot
jun 15 14:24:24 centauro rpm-ostree[116031]: Initiated txn Upgrade for client(id:gnome-software dbus:1.126 unit:app-gnome-org.gnome.Software-2800.scope uid:1000): /org/projectatomic/rpmostree1/fedora
jun 15 14:24:24 centauro rpm-ostree[116031]: Process [pid: 2800 uid: 1000 unit: user@1000.service] connected to transaction progress
jun 15 14:24:28 centauro rpm-ostree[116031]: libostree pull from 'fedora' for fedora/38/x86_64/silverblue complete
                                             security: GPG: commit 
                                             security: SIGN: disabled http: TLS
                                             non-delta: meta: 2 content: 0
                                             transfer: secs: 3 size: 788 bytes
jun 15 14:24:28 centauro rpm-ostree[116031]: 2 metadata, 0 content objects fetched; 788 B transferred in 3 seconds; 0 byte content written
jun 15 14:24:31 centauro systemd[1]: systemd-hostnamed.service: Deactivated successfully.
jun 15 14:24:31 centauro audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
jun 15 14:24:31 centauro audit: BPF prog-id=140 op=UNLOAD
jun 15 14:24:31 centauro audit: BPF prog-id=139 op=UNLOAD
jun 15 14:24:31 centauro audit: BPF prog-id=138 op=UNLOAD

jun 15 14:24:37 centauro gnome-software[2800]: /var/tmp/flatpak-cache-6EWJ61/org.freedesktop.Platform.GL32.nvidia-530-41-03-LO0P61/repo-1s6vu4: Pulled runtime/org.freedesktop.Platform.GL32.nvidia-530-41-03/x86_64/1.4 from flathub
jun 15 14:24:37 centauro systemd[1]: var-tmp-flatpak\x2dcache\x2d6EWJ61-org.freedesktop.Platform.GL32.nvidia\x2d530\x2d41\x2d03\x2dLO0P61.mount: Deactivated successfully.
jun 15 14:24:37 centauro rpm-ostree[116031]: Allowing active client :1.126 (uid 1000)
jun 15 14:24:37 centauro rpm-ostree[116031]: client(id:gnome-software dbus:1.126 unit:app-gnome-org.gnome.Software-2800.scope uid:1000) vanished; remaining=0
jun 15 14:24:39 centauro rpm-ostree[116031]: Preparing pkg txn; enabled repos: ['fedora-cisco-openh264', 'fedora-modular', 'updates-modular', 'updates', 'fedora', 'rpmfusion-free-updates', 'rpmfusion-free', 'rpmfusion-nonfree-updates', 'rpmfusion-nonfree', 'google-chrome', 'updates-archive'] solvables: 105955
jun 15 14:24:39 centauro rpm-ostree[116031]: Txn Upgrade on /org/projectatomic/rpmostree1/fedora successful
jun 15 14:24:39 centauro rpm-ostree[116031]: Unlocked sysroot
jun 15 14:24:39 centauro rpm-ostree[116031]: Process [pid: 2800 uid: 1000 unit: user@1000.service] disconnected from transaction progress
jun 15 14:24:39 centauro rpm-ostree[116031]: In idle state; will auto-exit in 61 seconds
jun 15 14:24:39 centauro rpm-ostree[116031]: Loaded sysroot
jun 15 14:24:39 centauro rpm-ostree[116031]: Locked sysroot
jun 15 14:24:39 centauro rpm-ostree[116031]: Initiated txn Upgrade for client(dbus:1.126 unit:app-gnome-org.gnome.Software-2800.scope uid:1000): /org/projectatomic/rpmostree1/fedora
jun 15 14:24:39 centauro rpm-ostree[116031]: Process [pid: 2800 uid: 1000 unit: user@1000.service] connected to transaction progress
jun 15 14:24:41 centauro rpm-ostree[116031]: Receiving metadata objects: 1/(estimating) 98 bytes/s 196 bytes
jun 15 14:24:43 centauro flatpak-system-helper[132535]: system: Pulled runtime/org.freedesktop.Platform.GL32.nvidia-530-41-03/x86_64/1.4 from /var/lib/flatpak/repo/tmp/flatpak-cache-PJVP61/repo-1s6vu4
jun 15 14:24:43 centauro rpm-ostree[116031]: libostree pull from 'fedora' for fedora/38/x86_64/silverblue complete
                                             security: GPG: commit 
                                             security: SIGN: disabled http: TLS
                                             non-delta: meta: 2 content: 0
                                             transfer: secs: 4 size: 788 bytes
jun 15 14:24:48 centauro flatpak-system-helper[132535]: system: Updated runtime/org.freedesktop.Platform.GL32.nvidia-530-41-03/x86_64/1.4 from flathub
jun 15 14:24:48 centauro gnome-software[2800]: libostree pull from 'flathub' for app/com.vscodium.codium/x86_64/stable complete
                                               security: GPG: summary+commit 
                                               security: SIGN: disabled http: TLS
                                               delta: parts: 1 loose: 3
                                               transfer: secs: 0 size: 5,8 kB
jun 15 14:24:48 centauro gnome-software[2800]: /var/tmp/flatpak-cache-6EWJ61/com.vscodium.codium-1OMR61/repo-RdtfMm: Pulled app/com.vscodium.codium/x86_64/stable from flathub
jun 15 14:24:48 centauro systemd[1]: var-tmp-flatpak\x2dcache\x2d6EWJ61-com.vscodium.codium\x2d1OMR61.mount: Deactivated successfully.
jun 15 14:24:49 centauro flatpak-system-helper[132535]: system: Pulled app/com.vscodium.codium/x86_64/stable from /var/lib/flatpak/repo/tmp/flatpak-cache-2TNR61/repo-RdtfMm
jun 15 14:24:49 centauro systemd[2189]: Starting tracker-extract-3.service - Tracker metadata extractor...
jun 15 14:24:49 centauro flatpak-system-helper[132535]: system: Updated app/com.vscodium.codium/x86_64/stable from flathub
jun 15 14:24:49 centauro systemd[2189]: Started tracker-extract-3.service - Tracker metadata extractor.
jun 15 14:24:51 centauro gnome-software[2800]: libostree pull from 'flathub' for app/org.yuzu_emu.yuzu/x86_64/stable complete
                                               security: GPG: summary+commit 
                                               security: SIGN: disabled http: TLS
                                               delta: parts: 2 loose: 8
                                               transfer: secs: 1 size: 21,0 MB
jun 15 14:24:51 centauro gnome-software[2800]: /var/tmp/flatpak-cache-6EWJ61/org.yuzu_emu.yuzu-A10P61/repo-9y2Imu: Pulled app/org.yuzu_emu.yuzu/x86_64/stable from flathub
jun 15 14:24:51 centauro systemd[1]: var-tmp-flatpak\x2dcache\x2d6EWJ61-org.yuzu_emu.yuzu\x2dA10P61.mount: Deactivated successfully.
jun 15 14:24:51 centauro kernel: BTRFS critical (device dm-0): corrupt leaf: root=258 block=605929472 slot=40 ino=36358, unknown incompat flags detected: 0x80000
jun 15 14:24:51 centauro kernel: BTRFS info (device dm-0): leaf 605929472 gen 13136 total ptrs 72 free space 6722 owner 258
jun 15 14:24:51 centauro kernel:         item 0 key (36353 1 0) itemoff 16123 itemsize 160
jun 15 14:24:51 centauro gnome-software[2800]: Failed to create a mountpoint for revokefs-fuse: Sistema de arquivos somente para leitura

Expected:

No broken filesystem.

Steps to reproduce it

Let Gnome Software update your system.

Would you like to work on the issue?

No, I don't have the necessary knowledge.

Extra notes

I use an NVME SSD (previously an ADATA that came with the laptop, now a WD Black that I bought due to thinking it was a defective SSD). IIRC, I believe I hit this issue 4 times, where only the last 3 ones I could blame on rpm-ostree, and two of those were ostree related files that got corrupted.

I don't know, but maybe rpm-ostree is writing to the disk a t full speed and, due to the NVME SSD not having a limit, it might be part of what causes it. (It could be the SSD running hot or the btrfs not being to handle the speed?)

cgwalters commented 1 year ago

I could blame on rpm-ostree

Sorry, it's going to take significant evidence that somehow the I/O patterns created by rpm-(ostree) trigger this.

Please try other tools for testing filesystem I/O - for example fio.

If you do gather more information and it really points towards (rpm-)ostree, it's OK to reopen.

MateusRodCosta commented 1 year ago

Sorry, it's going to take significant evidence that somehow the I/O patterns created by rpm-(ostree) trigger this.

Hi, I will investigate further in the future, for now I just disabled Automatic Updates via Gnome Software.

I believe that in at least two of the intances of the issue I was also programming at the time the corruption occurred. So, that would be rpm-ostree + VS Code + Flutter + Android AVD all at once.

Anyway, I will re-open if I find more info.

MateusRodCosta commented 1 year ago

@cgwalters Any chance https://github.com/ostreedev/ostree/pull/2874 was related to my issue?

The instances where file corruption always seemed to happen on one of rpm-ostree update attempts, maybe it was ostree itself instead of rpm-ostree IO usage?

It seems ostree 2023.4 fixes it, so I guess if I update and re-enable automatic system updates I should be fine.

Of note, before I switched SSDs and formatted the laptop, the files which got corrupted due to rpm-ostree were some files related to the ostree commits (from isnide /ostree). The one file that got corrupted after the SSD replacement (the one from the logs in this issue) was some random game image stored in my home folder, luckly I could easily replace it.

cgwalters commented 1 year ago

On Fri, Jun 23, 2023, at 6:44 PM, Mateus Rodrigues Costa wrote:

@cgwalters https://github.com/cgwalters Any chance ostreedev/ostree#2874 https://github.com/ostreedev/ostree/pull/2874 was related to my issue?

No; that could cause different files to appear, but not file system corruption.

Basically file system corruption can only happen through file system (or other kernel) bugs or faulty hardware. Those bugs could be triggered by userspace bugs, but not caused by.