Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
647 stars 55 forks source link

crawl_master raises invalid_argument: failed constraint check (offset + len <= v.size()) at fs.cc:754 on Ubuntu kernel 5.4.0-54-generic #156

Closed Laski closed 3 years ago

Laski commented 3 years ago

Once in a while, the crawl_master logs:

2020-11-18 05:03:13 6641.6647<5> crawl_master: *** EXCEPTION ***
2020-11-18 05:03:13 6641.6647<5> crawl_master:  exception type std::invalid_argument: offset + len = 1634628089, v.size() = 87152 failed constraint check (offset + len <= v.size()) at fs.cc:754
2020-11-18 05:03:13 6641.6647<5> crawl_master: ***

The other workers doesn't seem to be doing anything:

TOTAL:
        crawl_create=8 crawl_empty=2 crawl_restart=1 crawl_search=2
        exception_caught=2
        hash_extent_in=1024
RATES:
        crawl_create=0.4 crawl_empty=0.1 crawl_restart=0.05 crawl_search=0.1
        exception_caught=0.1
        hash_extent_in=51.185
THREADS (work queue 0 tasks, 4 workers):
        tid 9227: bees: [20.006s] waiting for signals
        tid 9228: progress_report: [20.007s] idle 3600
        tid 9229: status_report: writing status to file '/run/bees//8e84749d-2f59-44a3-9084-b112e12e1300.status'
        tid 9234: crawl_transid: [10.003s] waiting 100.028s for next 10 transid RateEstimator { count = 1124, raw = 0 / 10.0006, ratio = 1 / 20.0057, rate = 0.0499857, duration(1) = 20.0057, seconds_for(1) = 10.0006 }
        tid 9235: crawl_writeback: [20.006s] idle, dirty
        tid 9236: hash_writeback: [20.004s] idle after writing 0 of 1024 extents
        tid 9237: hash_prefetch: [19.928s] idle 3600s

Perhaps this is normal and I'm misunderstanding something, but I looked also for closed issues and documentation and couldn't find any with this error.

I'm on the latest master commit (1b9b437c11dd858213c963b14fe7771ab630c2b8)

$ uname -sr
Linux 5.4.0-54-generic
Elkropac commented 3 years ago

Hi, is that on ubuntu 18.04 ?

I have these messages here too, I don't know, if bees does some work, it seem just to sit there.

I was experimenting with compression, I mounted compression-force=zstd:3 instead of compress=zstd:3 and then run btrfs filesystem defragment -czstd -v -r ./ on my data. Could this have some effect on this?

This filesystem is for nakivo repository in "incremental with full backups" format, i want to see, if it can deduplicate full backups.

kakra commented 3 years ago

If the stats file isn't showing much more than what's quoted above, then it's not working. Usually, it should be full of a lot of numbers for both "TOTAL" and "RATES". One problem may be a broken beeshash.dat or a broken beescrawl.dat in $BEESHOME - probably the latter. Try to delete that latter file, if it still doesn't work, recreate the big hash file, too.

Zygo commented 3 years ago

Ubuntu doesn't use standard kernel version numbering.

Ubuntu version Linux 5.4.0-54-generic contains the 5.4 regression a48b73eca4ce "btrfs: fix potential deadlock in the search ioctl" but not the fix 1c78544eaa46 "btrfs: fix wrong address when faulting in pages in the search ioctl".

Upgrade or downgrade the kernel. This particular kernel version will not work.

Elkropac commented 3 years ago

Yeah, i was trying to match ubuntu version to some real kernel version. Anyway, i have installed 5.4.0-55-generic from kernel-ppa, it seems to have this fixed, i cannot find detailed changelog anywhere (i have seen some changelog in git, but it seems to be kernel for raspberry).

But bees seems to be humming again. Thanks

Laski commented 3 years ago

Thanks for your help! Really appreciated.

For anyone interested, I downgraded to 5.4.0-53-generic and the problem seems to be there too. It will be fixed on upstream focal when https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896824 is released on the next SRU. Personally, I think I'll wait till then.

As this is not bees fault and a fix is already "available" I think it makes sense to close this.