cberner / redb

An embedded key-value database in pure Rust
https://www.redb.org
Apache License 2.0
3.38k stars 157 forks source link

Add set_quick_repair() #893

Closed mconst closed 1 week ago

mconst commented 2 weeks ago

Here's the PR for #878!

This adds a new durability level above Paranoid that doesn't require repair. For now, it's just called "ParanoidPlus" -- not the most descriptive name, but it at least makes it very clear where it falls on the scale of commit speed vs. recoverability. If you have a better name, let me know 😄

To avoid repair, Durability::ParanoidPlus commits need to save the allocator state somewhere. We can't use the region headers, because we'd be overwriting them in place; we might crash partway through the overwrite, and then we'd need repair. So we instead save the allocator state to a new table in the system tree. Writing to the table is slightly tricky, because it needs to be done without allocating (see below), but other than that it's a perfectly ordinary transactional write with all the usual guarantees.

The other requirement to avoid repair is knowing whether the last transaction used 2-phase commit. For this, we add a new two_phase_commit bit to the god byte, which is always updated atomically along with swapping the primary bit. Old redb versions will ignore the new flag when reading and clear it when writing, which is exactly what we want.

This turns out to also fix a longstanding bug where Durability::Paranoid hasn't been providing any security benefit at all. The checksum forgery attack described in the Durability::Immediate documentation actually works equally well against Durability::Paranoid! The problem is that even though 2-phase commit guarantees the primary is valid, redb ignores the primary flag when repairing. It always picks whichever commit slot is newer, as long as the checksum is valid. So if you crash partway through a commit, it'll try to recover using the partially-written secondary rather than the fully-written primary, regardless of the durability mode.

The fix for this is exactly the two_phase_commit bit described above. After a crash, we check whether the last transaction used 2-phase commit; if so, we only look at the primary (which is guaranteed to be valid) and ignore the secondary. Durability::ParanoidPlus needs this check anyway for safety, so we get the Durability::Paranoid bug fix for free.

To write to the allocator state table without allocating, I've introduced a new insert_inplace() function. It's similar to insert_reserve(), but more general and maybe simpler. To use it, you have to first do an ordinary insert() with your desired key and a value of the appropriate length; then later in the same transaction you can call insert_inplace() to replace the value with a new one. Unlike insert_reserve(), this works with values that don't implement MutInPlaceValue, and it lets you hold multiple reservations simultaneously.

insert_inplace() could be safely exposed to users, but I don't think there's any reason to. Since it doesn't give you a mutable reference, there's no benefit over insert() unless you're storing data that cares about its own position in the database. So for now it's private, and I haven't bothered making a new error type for it; it just panics if you don't satisfy the preconditions.

The fuzzer is perfect for testing Durability::ParanoidPlus, because it can simulate a crash, reopen the database (skipping repair if possible), and then verify that the resulting allocator state exactly matches what would happen if it ran a full repair. I've updated the fuzzer to generate Durability::ParanoidPlus commits along with the existing Durability::None and Durability::Immediate.

cberner commented 2 weeks ago

Oh wow, this ended up being a lot more work than I'd initially thought. Thanks for the PR! I had been thinking it could work by setting the recovery bit to false, but this is much better.

I'm traveling without my computer for the next few days, so it'll be a bit before I can review this in detail. Sent from my phone

On Sat, Nov 9, 2024, 10:57 PM Adam Reichold @.***> wrote:

@.**** commented on this pull request.

In src/transactions.rs https://github.com/cberner/redb/pull/893#discussion_r1835605075:

 /// Security considerations: Many hard disk drives and SSDs do not actually guarantee that data

/// has been persisted to disk after calling fsync. Even with this commit level, an attacker /// with a high degree of control over the database's workload, including the ability to cause /// the database process to crash, can cause the database to crash with the god byte primary bit /// pointing to an invalid commit slot, leaving the database in an invalid, potentially attacker-controlled state. Paranoid,

  • /// Like [Durability::Paranoid], but also saves the allocator state as part of each commit.
  • /// This means repair is no longer required when opening the database after a crash.
  • ParanoidPlus,

I would suggest Enclosed here to convey that this writes more data inline to make the resulting persistent structure self-contained.

— Reply to this email directly, view it on GitHub https://github.com/cberner/redb/pull/893#pullrequestreview-2425663411, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNXQDPRVRTNRZAUU2CCMLZ737UXAVCNFSM6AAAAABRPVKIBSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMRVGY3DGNBRGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mconst commented 2 weeks ago

Sounds good! There's no hurry; enjoy your trip!

cberner commented 2 weeks ago

Ok, read through most of it now. One question. Do you think this quick repair approach can be used for 1-phase commit transactions too?

My thinking is that you need to save the allocator state and use 2-phase commit to get really fast repairs, because you need 2PC to avoid validating all the checksums. However, if you have the allocator state it seems like we can still accelerate a repair under 1-phase commit. It would still validate all the checksums, but then if it found the allocator state it could skip rebuilding the allocator state which requires a second scan of the file.

Does that seem like it works to you?

If so, then I think we should add a new method like WriteTransaction::set_quick_repair(bool) instead of adding this new Durability level, and it would work with both Immediate and Paranoid. That solves the naming issue too :)

mconst commented 2 weeks ago

Yeah, I like the idea of having this be a setting separate from the durability level.

I'm not sure we want to support using it with 1-phase commit, though. It seems like a footgun -- if you call set_quick_repair(true) but forget to also do set_durability(Durability::Paranoid), your code will look like it's working, but when you actually try to recover from a crash, it'll be slow.

If we want to speed up repair with 1-phase commit, I feel like the better solution would be to verify the checksums and reconstruct the allocator state in a single pass. That would give the same speed benefit -- and it would work for all 1-phase commit users, without requiring them to opt into a new mode or pay the cost of writing out the allocator state with each commit.

cberner commented 2 weeks ago

If we want to speed up repair with 1-phase commit, I feel like the better solution would be to verify the checksums and reconstruct the allocator state in a single pass. That would give the same speed benefit -- and it would work for all 1-phase commit users, without requiring them to opt into a new mode or pay the cost of writing out the allocator state with each commit.

Ah yes, good point, let's not support it for 1-phase commits.

Ok, so the naming options that seem appealing to me are: 1) Durability::ParanoidPlus 2) Durability::ParanoidWithQuickRepair or something like that 3) set_quick_repair(bool) where passing true also sets the durability level to Paranoid, and false leaves the durability level alone but disables quick repair.

(2) or (3) seem more descriptive than (1) to me. What do you think is most clear?

mconst commented 2 weeks ago

I think my preference at the moment is (3), but (2) also sounds fine!

If we go with (3), do you think it might be worth splitting 2-phase commit out into its own flag too? This would let people use Durability::Eventual with 2-phase commit and with quick-repair, both of which seem like reasonable combinations. We'd still support Durability::Paranoid for compatibility, but it would become an alias for Durability::Immediate with set_two_phase_commit(true). I think this makes the durability levels a bit more intuitive -- we'd be left with just None, Eventual, and Immediate, which are all very clear names that describe their semantics well (whereas the semantics of Paranoid have never really been clear from the name).

cberner commented 2 weeks ago

That sounds good to me!

mconst commented 1 week ago

Okay, I've updated the code to use set_two_phase_commit() and set_quick_repair(). The old Durability::Paranoid is now deprecated in favor of the former, but still supported for compatibility. Let me know if this looks like what you had in mind!

cberner commented 1 week ago

Merged. Thanks! This is awesome

mconst commented 1 week ago

Yay, thanks for taking the time to review all of this!

There's no hurry, but if you decide you're interested in doing the transition we were talking about in #894 to eventually get rid of the old allocator state, let me know. I've got code to do the extra quick-repair commit on shutdown, which I can submit as another PR if you want it.

cberner commented 1 week ago

Oh ya, it'd be great to get rid of that old code path. Yes, please send that PR