Open dim-geo opened 5 years ago
I think you can ignore this. An uneducated guess is that the extent position has changed on disk between bees submitting it to the dedup queue and actually deduping it. It just gives up here. The changed extent will be re-submitted automatically later when bees walks the new subvolume generations. Your data is not harmed by this.
@Zygo probably has a better (and much more educated) explanation. :-)
@kakra You're not wrong. ;)
The ExtentWalker
code is very paranoid about ensuring that it reads a complete and correct list of extents while scanning forward and backward in the filesystem, so it checks to see if the end of an extent it read earlier is the same as the beginning of an adjacent extent it read later, and vice versa. If any discrepancy is detected (e.g. the combined data contains two extent items that overlap, which is not allowed in btrfs), ExtentWalker
throws an exception and all dedupe processing for the extent is skipped.
Usually when this is triggered, the filesystem is modified at exactly the point where bees is scanning while bees is scanning it, so two reads of the extent data would produce conflicting results because the data changed between reads. This invalidates any data bees previously collected about extents in that area of the filesystem, so the correct thing to do is drop all previously read data and restart processing for this extent from the beginning.
The exception can also be triggered in cases where the src and dst extents in dedupe partially overlap, and bees is inadvertently modifying its src extent while trying to eliminate its dst extent because they are both references to the same physical extent. This causes the extent to reappear in future scans as "new" data. If this was not prevented, it would result in an infinite feedback loop where bees keeps trying to "fix" the offending extents, but because they are already all references to the same extent no further space can be saved. There are several points where bees detects this case and prevents it by exiting early from a loop or function, so the exception usually isn't triggered by this case in practice.
There is some discussion in #89 about reducing the visibility of these events in the short term. The long-term plan is to replace ExtentWalker
with a better design that handles this case without using an exception. At the moment because so many levels of stack are involved there's no simple way to refresh or ignore the inconsistent data, so an exception is used to force the top-level crawl code to move on to the next extent. Later scans will detect any new extents created during previous scans, so the extent processing restart does eventually happen.
Thanks for the clarification! I have around 2 TB of data, since I am a photographer and I need to make sure that no data would be lost due to bees :) Maybe as a first step towards #89 would be to just explain the exceptions expected by the bees, or elaborate the messages (make them more human readable) for the expected exceptions.
Something like "this exception prevents bees from grinding to a halt at this point. You can ignore this if you are not a bees developer
" could be more words than the code fix for lowering the log levels or silencing the specific exceptions in #89 . ;)
Still, the point is well taken from the evidence: a new user comes along, and hits all three known-harmless exceptions at once. Some of those are over a year old now, it's time to clean them up.
Please check #98 and #97