Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
625 stars 56 forks source link

What actually is a "toxic match"? #243

Closed cosmicdan closed 1 year ago

cosmicdan commented 1 year ago

Hi,

Very comfortable with the documentation and my understanding of how it works so far, have it running on a 4TB volume, slowly making progress on the ~3TB worth of heavily compressed audio/video data....

though I see a lot of mentions of a "toxic match" and "WORKAROUNDS" in the service status. My initial guess is that these are checksum collisions, but I have no idea as I cannot find any info on this term at all - apart from the occasional mention in shared logs on this issue tracker.

So, apologies if the question is silly, but what are these "toxic matches and workarounds" and should I be worried?

Thank you very much for the software, regardless!

Zygo commented 1 year ago

Toxic extents are mostly fixed in kernels after 5.4, so it's not something we have to worry about much today. The workaround in bees still exists because it's mostly harmless (false activation of the workaround usually causes about 0.001% failure to find matching duplicate blocks), and some users still run bees on 4.19 and earlier LTS kernels which still have the kernel bug.

A "toxic extent" is an extent that triggers excessive looping in the kernel code when btrfs has to find the set of references to the extent. They were caused by a combination of several factors, including the number of references to the extent, the file size, whether the extent references overlapped, and some details of metadata structure related to snapshots. Toxic extents could result in long-running loops in the kernel, locking out access to the filesystem and one CPU core for minutes to hours for each extent. The result was that the filesystem, while not damaged in any way, could become unusable on an older kernel: it would take hours to remove a file containing many toxic extents, and all users would be locked out of the filesystem until the removal was done.

A "toxic hash" is an entry in the bees hash table which has a bit set to indicate that the data block is part of a toxic extent. This is how bees remembers where toxic extents are to avoid them later on.

A "toxic match" event occurs when a hash lookup finds a duplicate of a block from a toxic extent. It's an ordinary match of incoming scanned data from the crawler with data stored in the hash table, but the hash table entry has the toxic bit set.

The toxic extent workaround is to immediately stop all processing of an extent when bees notices the extent is toxic itself, or contains data matching the hash of a known toxic extent. This avoids both the immediate CPU cost of looking up existing references to the toxic extent, and future costs from creating any more references to the extent which would make the toxic extent more toxic.

bees detects a toxic extent by measuring how much CPU time the kernel uses while making a list of references to the extent's data. Most extents require trivial amounts of CPU for reference listing, so when the CPU usage starts increasing rapidly, it indicates the extent may be triggering bad kernel behavior. Sometimes the kernel does other things during that time and these will falsely trigger the toxic extent workaround. This will result in a very small decrease in dedupe hit rate, but it should be less than 1%. Usually it's about 0.001%, or one out of every 100,000 extents, but if your filesystem has highly repetitive contents in large files, the toxic extent rate might be higher.

Checksum collisions (where two blocks are found to have the same hash value but different data in the block) are not reported via log messages. There's only an event counter for them.

A toxic match event can never result in a checksum collision event: toxic match detection stops processing before the second block can be read, so comparison of the block contents for collision detection is not possible.

cosmicdan commented 1 year ago

Thank you so much for the detailed explanation! So, considering I'm running bees on an archive disk (rarely has changes, only reads), and I'm on a modern kernel (6.1.1 atm), I don't really have anything to worry about :)

Worth keeping this issue around as it's a fantastic explanation for those new to Btrfs, search engines will surely index it heh.

Again, thank you!