Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
625 stars 56 forks source link

Bees hangs up 6.1.7-1-MANJARO kernel #246

Open S-trace opened 1 year ago

S-trace commented 1 year ago

Hi!

I'm running bees 0.8 on Manjaro with 6.1.7-1-MANJARO kernel on a Samsung T7 1GB USB SSD, and it works fine, but after some long time (when dedup is almost completed) it hangs, main bees process is consuming 100% cpu (one core) and I have the following kernel log: 0.8_hangup.txt

Reboot does not help - if I start bees on this volume after reboot - it hangs again with same errors.

How can I debug this problem? The filesystem is mounted with -o compress=zstd and bees is running with /usr/lib/bees/bees --no-timestamps --strip-paths --no-timestamps --thread-factor 1.2 --verbose 6 /var/cache/bees/mnt/b72fd2c0-8948-408d-b5a6-fd51939dfbbe command line.

I have executed btrfs scrub on this volume and it haven't detected any errors:

UUID:             b72fd2c0-8948-408d-b5a6-fd51939dfbbe
Scrub started:    Sat Jan 28 03:04:43 2023
Status:           finished
Duration:         0:04:48
Total to scrub:   92.30GiB
Rate:             328.17MiB/s
Error summary:    no errors found

Thanks.

Zygo commented 1 year ago

There is a known kernel bug: dedupe and LOGICAL_INO ioctls running on the same extent at the same time can cause an infinite loop in the LOGICAL_INO ioctl.

The master branch has a workaround which ensures that dedupe and LOGICAL_INO never run at the same time.

There are also some scheduling improvements in master to arrange for threads to always work on different extents for better performance.

S-trace commented 1 year ago

Thank you, I'll try bees-git and report if problem will persist.

S-trace commented 1 year ago

I have updated bees to bees version 0.9.r0.g849c071, but the problem is still here. dmesg log for new hangup: 0.9.r0.g849c071_hangup.txt

Zygo commented 1 year ago

The good news is that you didn't hit the known kernel bug. The bad news is that you appear to have discovered a different, previously unknown kernel bug.

S-trace commented 1 year ago

Can I help you to investigate this bug, create a workaround or (maybe) create a proper bugreport for kernel bug?