Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
636 stars 57 forks source link

exception type std::runtime_error: failed constraint check (ipos <= new_extent.m_begin) at extentwalker.cc:302 #99

Open dim-geo opened 5 years ago

dim-geo commented 5 years ago

Please check #98 and #97

 19:53:43 gentoo beesd[29713]: crawl_258[29748]: BeesAddress(fd 546 /run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted) offset 0x1000)
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: new_first = 3.191M [0x1000..0x332000] fd = 546 '/run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)'
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: grow_forward BeesRangePair: 3.188M src[0x1000..0x331000] dst[0x19f64000..0x1a294000]
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: src = 546 /run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: dst = 547 /run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/subvol_snapshots/1775/snapshot/datum
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: grow_backward BeesRangePair: 3.188M src[0x1000..0x331000] dst[0x19f64000..0x1a294000]
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: src = 546 /run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: dst = 547 /run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/subvol_snapshots/1775/snapshot/datum
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: e_second Extent { begin = 0x19f64000, end = 0x1fc0d5c9, physical = 0x5e7af3000, flags = FIEMAP_EXTENT_LAST, physical_len = 0x5caa000, logical_len = 0x5caa000 }
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: grow constrained = 1 *this = BeesRangePair: 3.188M src[0x1000..0x331000] dst[0x19f64000..0x1a294000]
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: src = 546 /run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: dst = 547 /run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/subvol_snapshots/1775/snapshot/datum
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: Found matching range: BeesRangePair: 3.188M src[0x1000..0x331000] dst[0x19f64000..0x1a294000]
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: src = 546 /run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: dst = 547 /run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/subvol_snapshots/1775/snapshot/datum
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: creating brp (4K [0x1000..0x2000] fd = 546 '/run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)', 4K [0x19f64000..0x19f65000] fid = 4480:717 fd = 547 '/run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/subvol_snapshots/1775/snapshot/datum')
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: Opening dst bfr 4K [0x19f64000..0x19f65000] fid = 4480:717 fd = 547 '/run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/subvol_snapshots/1775/snapshot/datum'
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: chase_extent_ref ino BtrfsInodeOffsetRoot { .m_inum = 717, .m_offset = 0x19f64000, .m_root = 4480 } bbd BeesBlockData { 4K 0x1000 fd = 546 '/run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)', data[4096] }
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: ino_off_root BtrfsInodeOffsetRoot { .m_inum = 717, .m_offset = 0x19f64000, .m_root = 4480 }
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: for_each_extent_ref BeesBlockData { 4K 0x1000 fd = 546 '/run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)', data[4096] } at 0x5e7af3000: 41 found
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: Opening src bfr 4K [0x1000..0x2000] fd = 546 '/run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)'
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: replace_src src_bfr 4K [0x1000..0x2000] fd = 546 '/run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/#302 (deleted)'
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: BeesContext::rewrite_file_range calling BeesResolver 92.664M [0x19f64000..0x1fc0e000) fd = 537 '/run/bees/mnt/6a5e0f31-4857-4352-811a-c55dcdd44433/pics/datum'
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: last
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: next_p 0x1fc0e000 p 0x1fc0d000 last_p 0x19f64000
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: Rewriting extent Extent { begin = 0x19f64000, end = 0x1fc0d5c9, physical = 0x5e7af3000, flags = FIEMAP_EXTENT_LAST, physical_len = 0x5caa000, logical_len = 0x5caa000 }
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: Extent { begin = 0x19f64000, end = 0x1fc0d5c9, physical = 0x5e7af3000, flags = FIEMAP_EXTENT_LAST, physical_len = 0x5caa000, logical_len = 0x5caa000 } block_count 23722
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: scan extent Extent { begin = 0x19f64000, end = 0x1fc0d5c9, physical = 0x5e7af3000, flags = FIEMAP_EXTENT_LAST, physical_len = 0x5caa000, logical_len = 0x5caa000 }
 19:53:43 gentoo beesd[29713]: crawl_258[29748]:
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: *** EXCEPTION ***
 19:53:43 gentoo beesd[29713]: crawl_258[29748]:         exception type std::runtime_error: ipos = 5509120, new_extent.m_begin = 5378048 failed constraint check (ipos <= new_extent.m_begin) at extentwalker.cc:302
 19:53:43 gentoo beesd[29713]: crawl_258[29748]: ***
kakra commented 5 years ago

I think you can ignore this. An uneducated guess is that the extent position has changed on disk between bees submitting it to the dedup queue and actually deduping it. It just gives up here. The changed extent will be re-submitted automatically later when bees walks the new subvolume generations. Your data is not harmed by this.

@Zygo probably has a better (and much more educated) explanation. :-)

Zygo commented 5 years ago

@kakra You're not wrong. ;)

The ExtentWalker code is very paranoid about ensuring that it reads a complete and correct list of extents while scanning forward and backward in the filesystem, so it checks to see if the end of an extent it read earlier is the same as the beginning of an adjacent extent it read later, and vice versa. If any discrepancy is detected (e.g. the combined data contains two extent items that overlap, which is not allowed in btrfs), ExtentWalker throws an exception and all dedupe processing for the extent is skipped.

Usually when this is triggered, the filesystem is modified at exactly the point where bees is scanning while bees is scanning it, so two reads of the extent data would produce conflicting results because the data changed between reads. This invalidates any data bees previously collected about extents in that area of the filesystem, so the correct thing to do is drop all previously read data and restart processing for this extent from the beginning.

The exception can also be triggered in cases where the src and dst extents in dedupe partially overlap, and bees is inadvertently modifying its src extent while trying to eliminate its dst extent because they are both references to the same physical extent. This causes the extent to reappear in future scans as "new" data. If this was not prevented, it would result in an infinite feedback loop where bees keeps trying to "fix" the offending extents, but because they are already all references to the same extent no further space can be saved. There are several points where bees detects this case and prevents it by exiting early from a loop or function, so the exception usually isn't triggered by this case in practice.

There is some discussion in #89 about reducing the visibility of these events in the short term. The long-term plan is to replace ExtentWalker with a better design that handles this case without using an exception. At the moment because so many levels of stack are involved there's no simple way to refresh or ignore the inconsistent data, so an exception is used to force the top-level crawl code to move on to the next extent. Later scans will detect any new extents created during previous scans, so the extent processing restart does eventually happen.

dim-geo commented 5 years ago

Thanks for the clarification! I have around 2 TB of data, since I am a photographer and I need to make sure that no data would be lost due to bees :) Maybe as a first step towards #89 would be to just explain the exceptions expected by the bees, or elaborate the messages (make them more human readable) for the expected exceptions.

Zygo commented 5 years ago

Something like "this exception prevents bees from grinding to a halt at this point. You can ignore this if you are not a bees developer" could be more words than the code fix for lowering the log levels or silencing the specific exceptions in #89 . ;)

Still, the point is well taken from the evidence: a new user comes along, and hits all three known-harmless exceptions at once. Some of those are over a year old now, it's time to clean them up.