Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
647 stars 55 forks source link

failed constraint check (!m_is_toxic) at bees-resolve.cc:260 #149

Open maicardi opened 4 years ago

maicardi commented 4 years ago

I am receiving a lot of exceptions running Bees, like this:

2020-08-23 23:33:46 2851.2857<5> crawl_1455: EXCEPTION 2020-08-23 23:33:46 2851.2857<5> crawl_1455: exception type std::runtime_error: failed constraint check (!m_is_toxic) at bees-resolve.cc:260 2020-08-23 23:33:46 2851.2857<5> crawl_1455: *** 2020-08-23 23:33:47 2851.2856<5> crawl_1455: WORKAROUND: toxic address: addr = 0x1b74a08c000, sys_usage_delta = 1.349, user_usage_delta = 0, rt_age = 1.36843, refs 1

Has someone any clue of that does this mean?

Other strange messages I receive are:

2020-08-23 23:36:10 2851.2857<7> crawl_1455: exception (ignored): exception type std::runtime_error: FIXME: bailing out here, need to fix this further up the call stack 2020-08-23 23:36:10 2851.2857<4> crawl_1455: WORKAROUND: abandoned toxic match for hash 0xe9990999995a5a99 addr 0x7d42ccbf000tz2000 matching bbd BeesBlockData { 4K 0x31b694000 fd = 12 '/srv/dev-disk-by-id-dm-name-sda6-crypt/@varie/Steam/steam.img', address = 0x94156e2000ze000, hash = 0xe9990999995a5a99, data[4096] }

I must confess these messages don't keep me relaxed when Bees is working on my data...

kakra commented 4 years ago

I reported these previously. Most important: Your data is safe, nothing to worry about. What bees is seeing here is a huge huge pile of small identical blocks. Reflinking all those small blocks has little to no benefit, even negative consequences because overhead in btrfs will outweigh the benefits of saving only little storage space. So bees marks such block hashes as toxic and ignores them from now on.

I reported this previously because it also showed up for me in the Steam library. Theory goes that games share a lot of small identical hashes, i.e. by using standard assets from development frameworks and/or the developers itself. Also, many file format containers for games share a lot of duplicate structures.

So, this is harmless.

The "bailing out" message is also harmless and more like a note that bees needs internal optimization. That's something that may be tackled when the next LTS kernel lands (probably 5.9 or 5.10, as it seems to be always the last kernel version released in a year), as btrfs has some improved API calls then.

When you see exceptions in the log, that's always a safe-guard that catches problems before anything remotely harmful could happen. Bees only uses safe functions from the kernel when handling data, if anything gets messed up, that happens in the kernel and you would see a warning in dmesg at best. Except for kernel bugs, bees is not able to damage or loose your data.

maicardi commented 4 years ago

Thanks a lot for your explainations.

May I ask if you are able to explain another mystery of Bees's output?

I have found nowhere in the docs an explaination of the meaning of the letters when files ad deduped: what does "D" and "d" and "+" mean?

2020-08-23 23:57:02 2851.2856<6> crawl_1455: dedup: src 9.5M [0x5f800000..0x60180000] {0x856a704e000} /srv/dev-disk-by-id-dm-name-sda6-crypt/@varie/SteamUnico DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD -by-id-dm-name-sda6-crypt/@varie/Steam/steam.img 2020-08-23 23:59:20 2851.2856<6> crawl_1455: copy: 60K [0x9677a4000..0x9677b3000] fd = 12 '/srv/dev-disk-by-id-dm-name-sda6-crypt/@varie/Steam/steam.img' ++++++++++++++++++++++++++++++++++++++++++dddd++ddddddddddddddddddddddddddddddd+++++++++++++++++++++++++dd+++++++++++d++++++++++++++++d+ddddddd+++++++++++++++ 2020-08-24 13:26:16 2851.2859<6> crawl_8139: scan: 133.483K 0x0 [DDDDDDDDDDDDdddddddddddddddddddddd] 0x215ef /srv/dev-disk-by-id-dm-name-sda6-crypt/@share/

Thanks again.

Marco

kakra commented 4 years ago

I suggest lowering the log level, most of this is uninteresting stuff and not useful for unattended operation of the daemon. I'm using --verbose=5 on all systems.

The bars come from src/bees-context.cc and represent a range of blocks:

The bar is initially filled with #. Blocks that are scheduled for insertion will become R (you would normally not see this I think), blocks that are all zero will become 0, blocks that match hashes will become M (also you should never see this), blocks that have been duduped become d, blocks that have been rewritten become +, clobbered blocks will become D (not sure what this means in context of bees, it is in the code path of blocks scheduled for rewrite but that haven't been rewritten, maybe the opposite of d aka blocks that were dropped because they've been deduped with the other block instance, or maybe because on-disk data changed during the process and the kernel refused to dedup the data). All other blocks will become .: These are fresh blocks that have been inserted into the hash table. If all blocks are ., the bar won't be displayed.

But I didn't program this, @Zygo may have a more detailed clue.

Zygo commented 4 years ago

In btrfs, extents are immutable. To do dedupe, a solution has to be found that removes every block within an extent, even if they are not duplicate; otherwise, the extent would continue to exist and no space is freed. The [DDDDDDDDDDDDdddddddddddddddddddddd] output is a block-by-block report of the disposition of every block in the extent.

The most common letters are:

d or D - block in the current extent replaced by existing block elsewhere in the filesystem + - block replaced by block from temporary file

"+" are non-duplicate blocks are copied to a temporary file, then the copy is deduped over the original data. A later bees pass will scan these new blocks and insert them into the hash table or eliminate them if duplicates are found later. There is a "0" internal state, but zero-filled blocks are handled by creating a hole in a temporary file and deduping the extent with the hole, which ends up as "+" in the final output. The other letters ("R", "M", ".") are internal states that normally never appear in the output (they are used for debugging).

The "d" blocks are blocks that were deduped without using a matching entry in the hash table. The last "D" block before a series of "d" blocks is the block that matched the hash table. So in [DDDDDDDDDDDDdddddddddddddddddddddd] we can see that the first 11 blocks of the extent were not found in the hash table (i.e. they were evicted and only the 12th block remained in the hash table). Once a matching block is found, bees checks all the other blocks in the two matching extents to see if they match too, so it is only necessary to find one matching block per extent to have effective dedupe.

The hash table prioritization algorithm will prefer to keep blocks that result in matches and discard blocks that don't, so it's fairly normal to end up with extents where a matching block is found half way through an extent. This is how bees gets away with high dedupe hit rates even when the hash table is undersized by an order of magnitude.

This meaning of "d" and "D" isn't obvious from the code. It's all in one loop that does lookup, insertion, and dedupe in a single pass. After each block is read, if we didn't find the block in the hash table and if it's not zero, we create a new hash table to be inserted into the hash table. We don't insert anything until we get to the end of the extent, because any change to the extent requires eliminating the entire extent, and any blocks we inserted would be invalid ("clobbered" in the comments). If the block does match a hash table entry, adjacent blocks are checked to see if they are also duplicate. When we find duplicate blocks after the matching block, we don't need to create hash table entries for them, so they are duplicate blocks without new hash table entries ('d'). When we find duplicate blocks before the matching block, they are duplicate blocks that also have new hash table entries ('D'). The block that matched is the last "D" because we created a new hash table entry for it (needed to find the match in the hash table) and then invalidated the new entry because we did a dedupe in the extent.

As the number of references to an extent increases, btrfs performance can decrease exponentially. The performance loss can be severe, resulting in kernel CPU loops running for minutes or hours from just a few thousand references. Notice the 35-minute and 75-minute gaps in the duperemove -d -r line:

00-all-df-summary

bees tracks btrfs performance and marks the extent "toxic" as performance starts to decrease, and stores the toxic status in the hash table. To avoid triggering the btrfs bug, these extents are not used in future dedupe operations. Content matches are disallowed because toxic extents contain content that usually appears many times in large filesystems, and each copy could result in a new toxic extent if deduped. Normally toxic blocks are 1 in 100,000 blocks so there is little or no loss of efficiency if they are left duplicate.

There are a number of checks in bees to ensure a toxic extent is not touched accidentally. This can happen during searches for adjacent duplicate blocks. When such a block is encountered, an exception is thrown to force bees to abandon all processing on the extent and move to the next one. It's a crude but effective short-term solution (the long-term solution is fixing the kernel bug).

Kernel 5.7 and later have improved backref handling performance, making the toxic extent workaround less necessary.

kakra commented 4 years ago

Oh yeah, I once again forgot that multiple blocks in sequence can and are matched only by the hash of the first block in the sequence. Now d and D make a lot more sense. :-)

BTW: Does bees automatically adapt to the improved backref performance in 5.7 and marks fewer hashes as toxic?

Zygo commented 4 years ago

It's not just the first block in the sequence--it's the first matching block. An extent boundary can appear anywhere, so a matching block can be found at the beginning of extent A but somewhere in the middle of extent B. So we then need to find some other extent C that contains a duplicate copy of the beginning of B, and the first place to look for that on the filesystem is immediately before A. Then we get dedupe(end of C + beginning of A, B) and wipe out all of B with dedupe blocks:

                  v- matching block
     ...CCCCCCCC][AAAAAAAAA...   <- A from hash table, C from file next to A
       [BBBBBBBB  BBBBBBBBB]     <- B from scanned file
                  ^- matching block

       [DDDDDDDD  Ddddddddd]      <- in bees log
                           (spaces added for clarity)

When there are multiple "Dddd" strings it's because there are many separate matches:

    [AAAA][BBBB][CC][XXXX][YYYY][ZZZZZ]   <- ABCXYZ from hash table and neighbor search
    [EEEE  EEEE  EE  EEEE  EEEE  EEEEE]   <- E from scanned file
    [Dddd  Dddd  Dd  Dddd  Dddd  Ddddd]   <- in bees log

Sadly, bees is not yet smart enough to flip that example around, and replace all the short extents with the long one. Also a long series of D's doesn't tell you about all the boundaries. Maybe I should put the M's back in...

There are fewer toxic extents in 5.7 and later due to performance fixes. bees won't forget the previously remembered toxic extents, but they might not be toxic any more. I'm testing a "forget all toxic extents" option.

Not really a good idea to run 5.8 yet (there's yet another crashing bug which seems to be triggered by balances).