Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
647 stars 55 forks source link

Understanding hash table page occupancy #174

Closed a-schild closed 3 years ago

a-schild commented 3 years ago

Hello,

I'm doing the first steps with bees dedupe on a backup system.

In the .beeshome folder I see a beesstats.txt file which contains a histogram. During the first hours of dedupe, the histogram was "normal distribution" like this: picture

Now a few hours later, all hash symbols are on the right side, like this:

Now:     2021-04-27-15-23-44
Uptime:  21642.4 seconds
Version: v0.6-136-g243480b

Hash table page occupancy histogram (67108858/67108864 cells occupied, 99%)
                                                                 524288 pages
                                                               # 262144
                                                               # 131072
                                                               # 65536
                                                               # 32768
                                                               # 16384
                                                               # 8192
                                                               # 4096
                                                               # 2048
                                                               # 1024
                                                               # 512
                                                               # 256
                                                               # 128
                                                               # 64
                                                               # 32
                                                               # 16
                                                               # 8
                                                               # 4
                                                               # 2
                                                               # 1
0%      |      25%      |      50%      |      75%      |   100% page fill
compressed 348630 (0%)
uncompressed 66760228 (99%) unaligned_eof 107413 (0%) toxic 7 (0%)

For me, this looks like my hash table is too small for the data in it, or do I read the histogram incorrectly?

The btrfs volume is 17TB in size and currently I have 0.5TB on data on it. I did start bees with a 1GB hash table, is this too small for that use case?

a-schild commented 3 years ago

Duplicate of #66