btrfs / linux

Linux kernel source tree -- Btrfs commits
https://btrfs.readthedocs.io/en/latest/
Other
13 stars 5 forks source link

BUG 6.5.9: rw mount failed on ENOSPC during orphan cleanup #1037

Closed kdave closed 10 months ago

kdave commented 10 months ago

Report from IRC, kernel 6.5.9, mount fails with ENOSPC and transaction abort

[15491.375903] BTRFS info (device sdc3): using crc32c (crc32c-intel) checksum algorithm
[15491.375919] BTRFS info (device sdc3): using free space tree
[15491.396146] BTRFS info (device sdc3): bdev /dev/sdd3 errs: wr 1299065141, rd 10234262, flush 6762148, corrupt 109308019, gen 0
[15492.073822] BTRFS info (device sdc3): enabling ssd optimizations
[15492.073831] BTRFS info (device sdc3): auto enabling async discard
[15524.215307] BTRFS warning (device sdc3): Skipping commit of aborted transaction.
[15524.215314] ------------[ cut here ]------------
[15524.215315] BTRFS: Transaction aborted (error -28)
[15524.216360] WARNING: CPU: 1 PID: 1958 at fs/btrfs/transaction.c:1997 btrfs_commit_transaction+0xd73/0xef0 [btrfs]
[15524.216488] Modules linked in: ccm qrtr algif_aead crypto_null cbc des_generic libdes ecb algif_skcipher cmac md4 algif_hash af_alg snd_sof_pci_intel_apl snd_sof_intel_hda_common intel_pmc_bxt soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence intel_telemetry_pltdrv snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils intel_punit_ipc soundwire_generic_allocation intel_telemetry_core soundwire_bus x86_pkg_temp_thermal intel_powerclamp snd_soc_avs coretemp snd_soc_hda_codec kvm_intel snd_soc_skl kvm snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_hda_codec_hdmi snd_soc_acpi irqbypass snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine crct10dif_pclmul snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio polyval_generic snd_hda_intel gf128mul snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd processor_thermal_device_pci_legacy soundcore ghash_clmulni_intel intel_rapl_msr processor_thermal_device
[15524.216542]  processor_thermal_rfim ee1004 processor_thermal_mbox processor_thermal_rapl intel_rapl_common intel_soc_dts_iosf mei_pxp mei_hdcp rapl pcspkr i2c_i801 mei_me intel_cstate wdat_wdt int3406_thermal mei int3400_thermal int3403_thermal i2c_smbus acpi_thermal_rel int340x_thermal_zone cfg80211 mac_hid dptf_power rfkill pkcs8_key_parser fuse dm_mod ip_tables x_tables overlay squashfs loop isofs cdrom uas usb_storage usbhid i915 btrfs crc32_pclmul sha512_ssse3 drm_buddy aesni_intel i2c_algo_bit blake2b_generic r8169 crypto_simd xor ttm realtek raid6_pq xhci_pci libcrc32c crc32c_generic mdio_devres cryptd xhci_pci_renesas crc32c_intel drm_display_helper libphy cec video intel_gtt wmi
[15524.216595] CPU: 1 PID: 1958 Comm: btrfs-transacti Tainted: G        W          6.5.9-arch2-1 #1 f3e31240753f1687edd57c7c0d0f930d7e8c4a55
[15524.216599] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J5005-ITX, BIOS P1.40 08/06/2018
[15524.216601] RIP: 0010:btrfs_commit_transaction+0xd73/0xef0 [btrfs]
[15524.216707] Code: 85 a8 01 00 00 85 c0 75 e0 48 8b 34 24 4c 89 e7 e8 02 cb d2 c9 e9 96 f7 ff ff 8b 74 24 08 48 c7 c7 98 85 94 c0 e8 0d 74 cb c9 <0f> 0b e9 0e fd ff ff f0 83 44 24 fc 00 48 8b 90 50 03 00 00 65 ff
[15524.216709] RSP: 0018:ffffabec43c77e28 EFLAGS: 00010286
[15524.216712] RAX: 0000000000000000 RBX: ffff8fdd805f49c0 RCX: 0000000000000027
[15524.216714] RDX: ffff8fe0f00a16c8 RSI: 0000000000000001 RDI: ffff8fe0f00a16c0
[15524.216715] RBP: ffff8fdd98c38e00 R08: 0000000000000000 R09: ffffabec43c77cb8
[15524.216717] R10: 0000000000000003 R11: ffffffff8c2ca068 R12: ffff8fdd805f4900
[15524.216718] R13: ffff8fdd857a3000 R14: ffff8fdd805f4a18 R15: 00000000ffffffe4
[15524.216720] FS:  0000000000000000(0000) GS:ffff8fe0f0080000(0000) knlGS:0000000000000000
[15524.216722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15524.216724] CR2: 000055c1ac9cd158 CR3: 0000000415c20000 CR4: 0000000000350ee0
[15524.216726] Call Trace:
[15524.216728]  <TASK>
[15524.216730]  ? btrfs_commit_transaction+0xd73/0xef0 [btrfs 44509059d1ba69fec323f7b72033cf848fd20d5e]
[15524.216836]  ? __warn+0x81/0x130
[15524.216844]  ? btrfs_commit_transaction+0xd73/0xef0 [btrfs 44509059d1ba69fec323f7b72033cf848fd20d5e]
[15524.216951]  ? report_bug+0x171/0x1a0
[15524.216956]  ? handle_bug+0x3c/0x80
[15524.216960]  ? exc_invalid_op+0x17/0x70
[15524.216963]  ? asm_exc_invalid_op+0x1a/0x20
[15524.216968]  ? btrfs_commit_transaction+0xd73/0xef0 [btrfs 44509059d1ba69fec323f7b72033cf848fd20d5e]
[15524.217069]  ? btrfs_commit_transaction+0xd73/0xef0 [btrfs 44509059d1ba69fec323f7b72033cf848fd20d5e]
[15524.217169]  ? __pfx_autoremove_wake_function+0x10/0x10
[15524.217176]  transaction_kthread+0x141/0x1b0 [btrfs 44509059d1ba69fec323f7b72033cf848fd20d5e]
[15524.217280]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs 44509059d1ba69fec323f7b72033cf848fd20d5e]
[15524.217379]  kthread+0xe5/0x120
[15524.217387]  ? __pfx_kthread+0x10/0x10
[15524.217389]  ret_from_fork+0x31/0x50
[15524.217394]  ? __pfx_kthread+0x10/0x10
[15524.217396]  ret_from_fork_asm+0x1b/0x30
[15524.217404]  </TASK>
[15524.217405] ---[ end trace 0000000000000000 ]---
[15524.217407] BTRFS info (device sdc3: state A): dumping space info:
[15524.217409] BTRFS info (device sdc3: state A): space_info DATA has 1207286423552 free, is not full
[15524.217411] BTRFS info (device sdc3: state A): space_info total=3962095796224, used=2754809241600, pinned=0, reserved=0, may_use=0, readonly=131072 zone_unusable=0
[15524.217415] BTRFS info (device sdc3: state A): space_info METADATA has 7576092672 free, is full
[15524.217417] BTRFS info (device sdc3: state A): space_info total=28991029248, used=20877279232, pinned=0, reserved=0, may_use=537526272, readonly=131072 zone_unusable=0
[15524.217420] BTRFS info (device sdc3: state A): space_info SYSTEM has 32882688 free, is not full
[15524.217422] BTRFS info (device sdc3: state A): space_info total=33554432, used=671744, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
[15524.217425] BTRFS info (device sdc3: state A): global_block_rsv: size 536870912 reserved 536870912
[15524.217427] BTRFS info (device sdc3: state A): trans_block_rsv: size 0 reserved 0
[15524.217428] BTRFS info (device sdc3: state A): chunk_block_rsv: size 0 reserved 0
[15524.217430] BTRFS info (device sdc3: state A): delayed_block_rsv: size 0 reserved 0
[15524.217431] BTRFS info (device sdc3: state A): delayed_refs_rsv: size 0 reserved 0
[15524.217434] BTRFS: error (device sdc3: state A) in cleanup_transaction:1997: errno=-28 No space left
[15524.217490] BTRFS error (device sdc3: state A): Error removing orphan entry, stopping orphan cleanup
[15524.219483] BTRFS error (device sdc3: state EA): could not do orphan cleanup -22
[15524.220565] BTRFS error (device sdc3: state EA): commit super ret -30
[15524.239361] BTRFS error (device sdc3: state EA): open_ctree failed
kdave commented 10 months ago

Orphan cleanup failures can be skipped at mount/remount, with a warning. The process is started again when a root path is open.

The device stats bdev /dev/sdd3 errs: wr 1299065141, rd 10234262, flush 6762148, corrupt 109308019, gen 0 show that something really bad happened so it's most likely not due to normal operations.

kdave commented 10 months ago

Before RW mount there are several other things done that must be done (relocation recovery, orphan root lookup, free space tree, uuid tree, ...), orphan cleanup is just one of them. If the real reason is ENOSPC this would fail one of the others anyway. Given the device stats, it's not clear if the ENOSPC situation is due to space management bug or because the filesystem git hit by a hw error, or something else.