Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
661 stars 55 forks source link

Should BEESHOME beeshash.dat become a file with noCOW attribute #214

Closed tkuschel closed 2 years ago

tkuschel commented 2 years ago

I have seen, that the beeshash.dat file within the .beeshome directory is written extremly often. Does it make sence to unset the copy-on-write (COW) in btrfs wit the C attribute, using "chattr +C". Isn't it using small random writes and built-in COW would generate defragmentation? Maybe the whole subdirectory BEESHOME?

kakra commented 2 years ago

As far as I know, bees uses optimized patterns to keep fragmentation low and also allow compression. It was designed to write to this file in cow mode.

Zygo commented 2 years ago

The hash table has to handle tens of thousands of updates per second, so it couldn't use a typical page-oriented database as those would be orders of magnitude too slow. beeshash.dat is a memory dump throttled to a slow rate to avoid flooding the disk with writes. The hash data is written sequentially, and the write size is optimized for writeback and compression.

Every few hours the entire table will be rewritten, which is similar to the effect of btrfs fi defrag, so you don't need to run that either.

If you enable nodatacow unnecessarily, you lose the ability to detect and recover from data corruption on the underlying disks. nodatacow should only be used as a last resort after considering all other alternatives, including using a different filesystem.

tkuschel commented 2 years ago

Thank you for the good and clear answer. All uncertainties buried. - Great tool, keep it up!