Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
647 stars 55 forks source link

bees hangs with 100% cpu #139

Open ardiehl opened 4 years ago

ardiehl commented 4 years ago

I have bees running for about a year on my 4 TB backup volume. Yesterday the volume was nearly full and as i found no way to check the deduplication state i decided to rescan by stopping bees, deleting the files in /.beeshome, update bees to the latest version (4. January 2020), inceased the hash size to 8 GB and restarted bees. After a few hours, beesd hangs with 100% cpu usage and can no longer be stopped.

The volume is accessible, however, after changing the current directory to .beeshome, ls hangs as well. Kernel: 5.3.16-300.fc31.x86_64 i have saved a large part of the bees log output here: beesd.log.gz Btw, there are some core dumps as i passed wrong parameters so beesd did not start. my conf:

UUID=be5f70ab-e6ed-49ee-98ea-a6ffa506adbb OPTIONS="--thread-count=16 --thread-factor=1" DB_SIZE=$((864$AL16M)) # 8G in bytes

kakra commented 4 years ago

How much free space is there now? You should really take care to not completely fill a btrfs volume. Maybe start over with a much smaller hash size, 1 GB should be enough (I'm using 1 GB hash on a 5 TB volume). If you feel like the hash on btrfs may be the problem, you can put .beeshome on another volume (tho, the beesd script doesn't like this, you need to invoke bees manually then).

You can use the compsize utility to check the dudup state of your files and directories.

ardiehl commented 4 years ago

i had 28 GB available on that volume. I restarted the system and bees 3 hours ago and now i have 55 GB free so bees has done some deduplication that for whatever reason did not happen previously. I assume there was some deadlock in btrfs, the system shutdown took a a long time at deactivating the btrfs volume.

Question regarding compsize, tried it on a qcow2 file that is on the disk several times, tried 10 and the disk usage is more that 10 times the size of the file. Is my interpretation correct that this means there was no deduplication for that file (would explain why my backup volume has no free space) ?

[ad@lnx lnx_backup]$ ls -lh 01/home/ad/vm/win10-tax.qcow2 -rw-r--r-- 1 qemu qemu 51G Dec 1 23:51 01/home/ad/vm/win10-tax.qcow2

[ad@lnx lnx_backup]$ sudo compsize 01/home/ad/vm/win10-tax.qcow2 02/home/ad/vm/win10-tax.qcow2 03/home/ad/vm/win10-tax.qcow2 04/home/ad/vm/win10-tax.qcow2 05/home/ad/vm/win10-tax.qcow2 06/home/ad/vm/win10-tax.qcow2 07/home/ad/vm/win10-tax.qcow2 08/home/ad/vm/win10-tax.qcow2 09/home/ad/vm/win10-tax.qcow2 10/home/ad/vm/win10-tax.qcow2 Processed 10 files, 32659 regular extents (213368 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced
TOTAL 99% 670G 671G 499G
none 100% 670G 670G 495G
zlib 30% 418M 1.3G 3.8G

Zygo commented 4 years ago

It's fairly common for VM image files to occupy more space on btrfs than the sum of their block counts. This is fairly common if the files are aggressively defragmented (which makes the average extent size larger) and compression is not used (which would reduce the average extent size).

Extents are immutable in btrfs, so if any reference to any block of the extent remains, the entire extent remains on disk. If a VM overwrites 127MB of a 128MB extent, the total disk space usage is 255MB: 127MB of new data, 1MB from the original 128MB are reachable through the file, and 127MB of the original 128MB are unreachable. The worst case is extents that occupy 32768 times their referenced block size (128MB / 4K = 32768, with 32767 of those 4K blocks unreachable). It's more common for there to be 20-50% overhead in unreachable blocks, as you have:

Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       99%      670G         671G         499G       
none       100%      670G         670G         495G       
zlib        30%      418M         1.3G         3.8G

bees cannot currently deduplicate the unreachable blocks. At the moment there is no good solution for this.

There are some bad solutions that work, though: run

btrfs fi defrag -t 1M */home/ad/vm/win10-tax.qcow2

then all the blocks should be reachable and bees can then dedupe them.

Zygo commented 4 years ago

I assume there was some deadlock in btrfs, the system shutdown took a a long time at deactivating the btrfs volume.

The log contains:

Jan 05 03:12:01 lnx.armin.d beesd[142082]: 2020-01-05 03:12:01 142110.142111<6> progress_report: tid 142121: crawl_5: [2836.99s] resolving addr 0x107e7f00000 with LOGICAL_INO

so it looks like the last thing bees did was to start iterating backrefs for an extent. Were there any btrfs kernel messages from around Jan 5 02:24:44 (i.e. 2020-01-05 03:12:01 - 2837 seconds)? How long does it take to run

btrfs ins log $((0x107e7f00000)) .

on the filesystem?

This could be a toxic extent--an extent that takes an unreasonable amount of kernel time (i.e. minutes, possibly hours) to process. Or it could be one of a dozen recently found and fixed kernel bugs.

ardiehl commented 4 years ago

Thx, for the answer. Sice this volume is a backup only volume where files will be rsynced on, there should be no fragmentation for the VM image files (the files are copied once via rsync only).

[root@lnx backup]# time btrfs ins log $((0x107e7f00000)) .

real 0m0.292s user 0m0.001s sys 0m0.002s So this was (or is now) not the problem. And no, there were no kernel messages regarding btrfs at all. After i have restarted the system (which took ages), bees worked fine. After 2 days i now have 90TB free on that volume. I tied compsize for the whole volume but gave up after 24 hours. Any other way to find out what the overall deduplication ratio is ?

kakra commented 4 years ago

If you're using rsync and want to reduce dedup efforts, please run rsync with --no-whole-file --inplace, otherwise rsync will rebuild the complete file even for minor changes, and all shared extents become unshared over time. This mode will leave you with incomplete files if rsync is interrupted because it allows for no atomic replace. But you can protect against this by creating a snapshot before rsync.

I have a gist for how I used it some years back: https://gist.github.com/kakra/5520370

That service works by syncing to a scratch area, then on success it will create a snapshot of the scratch area. You may not want to do read-only snapshots when using bees.

But since then I moved to borgbackup.

Zygo commented 4 years ago

I tied compsize for the whole volume but gave up after 24 hours. Any other way to find out what the overall deduplication ratio is ?

compsize (or some algorithm equivalent to it, like btrfs fi du) is the only way to know the dedupe ratio. You have to add up the size of all the references in the filesystem, and that means reading and sorting most of the filesystem's metadata.

btrfs qgroups can track the size of references, but when used with bees, btrfs qgroups will quickly reach 100% CPU in the kernel, and stay there.

bees knows how many references it deleted, but not how many of the referenced physical extents it deleted (to have this information, bees would have to enumerate all such references multiple times, turning a O(n) algorithm into an O(n^2) one). There are some beesstats that count these raw events, but when used as estimates of dedupe efficiency the raw stats are often extremely wrong.

If you have a suitable sample subset of your files (similar contents and update patterns), you can run compsize on the sample and extrapolate to the entire filesystem.

ardiehl commented 4 years ago

If you're using rsync and want to reduce dedup efforts, please run rsync with --no-whole-file --inplace

I'm using rsync with --link-dest=$TARGET/$LASTBACKUP so that equal files will not be written again and therefore no dedup is needed. I have one directory per day, one per month and one per year. It is handy to have the backup on a separate disk that can be accessed without any special backup software.

If you have a suitable sample subset of your files (similar contents and update patterns), you can run compsize on the sample and extrapolate to the entire filesystem.

I will have a look at the larger ones, thx.