apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.19k stars 1.29k forks source link

Fix the key range affected by setting version stamped key #11424

Open keijokapp opened 1 month ago

keijokapp commented 1 month ago

As explained in this forum post, the range that gets marked as unreadable should start from the next possible read version, not the current one. That's because the final version stamp can't possibly start with the current read version. The +1 should be added to the read version if it's known.

The test script (JS):

import { beforeEach, test } from 'node:test';
import * as fdb from 'foundationdb';
import assert from 'node:assert';

fdb.setAPIVersion(720);

const db = fdb.open();

beforeEach(() => db.clearRange(Buffer.from([]), Buffer.from([0xff])));

// currently works
test('read from the range of the previous read version', async () => {
    await db.doTn(async tn => {
        const readVersion = await tn.getReadVersion();
        const previousReadVersion = Buffer.from((BigInt(`0x${readVersion.toString('hex')}`) - 1n).toString(16).padStart(16, '0'), 'hex');

        tn.setVersionstampedKeyRaw(Buffer.from([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /**/ 0, 0, 0, 0]), '');

        await tn.get(Buffer.concat([previousReadVersion, Buffer.from([0, 0])]));
        await tn.get(Buffer.concat([previousReadVersion, Buffer.from([0xff, 0xff])]));
    });
});

// will work when 1 is added to the start of the range that's marked unreadable
test('read from the range of the current read version', async () => {
    await db.doTn(async tn => {
        const readVersion = await tn.getReadVersion();

        tn.setVersionstampedKeyRaw(Buffer.from([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /**/ 0, 0, 0, 0]), '');

        await tn.get(Buffer.concat([readVersion, Buffer.from([0, 0])]));
        await tn.get(Buffer.concat([readVersion, Buffer.from([0xff, 0xff])]));
    });
});

// currently works
test('read at the next read version', async () => {
    const result = db.doTn(async tn => {
        const readVersion = await tn.getReadVersion();
        const nextReadVersion = Buffer.from((BigInt(`0x${readVersion.toString('hex')}`) + 1n).toString(16).padStart(16, '0'), 'hex');

        tn.setVersionstampedKeyRaw(Buffer.from([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /**/ 0, 0, 0, 0]), '');

        await tn.get(Buffer.concat([nextReadVersion, Buffer.from([0, 0])]));
    });

    await assert.rejects(result, 'Error: Read or wrote an unreadable key');
});

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

atn34 commented 1 month ago

Thanks! It looks like I can't merge it because required statuses haven't finished for some reason. @jzhou77 can you advise please?

jzhou77 commented 1 month ago

Thanks! It looks like I can't merge it because required statuses haven't finished for some reason. @jzhou77 can you advise please?

I closed and reopened this PR to kick the CI.

foundationdb-ci commented 1 month ago

Result of foundationdb-pr on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang-ide on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

jzhou77 commented 1 month ago

Looks like code format issues. Please use clang-format version 15 to fix the issue.

keijokapp commented 1 month ago

Would it be okay to squash the commits together? Would be less noise.

atn34 commented 1 month ago

Yup, go ahead

On Thu, May 30, 2024 at 1:11 PM Keijo Kapp @.***> wrote:

Would it be okay to squash the commits together? Would be less noise.

— Reply to this email directly, view it on GitHub https://github.com/apple/foundationdb/pull/11424#issuecomment-2140789726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARO7B3CD5E3AFSRH657GO3ZE6BWXAVCNFSM6AAAAABIETZHNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBQG44DSNZSGY . You are receiving this because you commented.Message ID: @.***>

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang-ide on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos on macOS Ventura 13.x

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang-ide on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos on macOS Ventura 13.x

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

foundationdb-ci commented 1 month ago

Result of foundationdb-pr on Linux CentOS 7

keijokapp commented 3 weeks ago

Amended with the following patch. Otherwise that key would be outside the overall range, making this specific key unreadable.

diff --git a/fdbclient/ReadYourWrites.actor.cpp b/fdbclient/ReadYourWrites.actor.cpp
index 070c3456f..25ffe8cf6 100644
--- a/fdbclient/ReadYourWrites.actor.cpp
+++ b/fdbclient/ReadYourWrites.actor.cpp
@@ -2273,7 +2273,7 @@ void ReadYourWritesTransaction::atomicOp(const KeyRef& key, const ValueRef& oper
        // k is the unversionstamped key provided by the user.  If we've filled in a minimum bound
        // for the versionstamp, we need to make sure that's reflected when we insert it into the
        // WriteMap below.
-       transformVersionstampKey(k, tr.getCachedReadVersion().orDefault(0), 0);
+       transformVersionstampKey(k, tr.getCachedReadVersion().map([](Version v) { return v + 1; }).orDefault(0), 0);
    }

    if (operationType == MutationRef::SetVersionstampedValue) {
atn34 commented 3 weeks ago

Nice catch. Did a simulation test find it? The simulation tests that failed in the CI run didn't seem directly related, and my not-very-confident understanding is that this should only show up as a key unexpectedly unreadable within the same transaction, so I'm kind of confused.

keijokapp commented 3 weeks ago

Found it by manual white-box testing while trying to solve an another (slightly related) issue.

Since I failed to find a solution to that other problem, it might be good to outline it here.

Applying an arbitrary default value to some state (ie tr.getCachedReadVersion().orDefault(0)) makes me naturally cautious. If the special case of applying the default value isn't balanced out by some special case handling later, then it's likely a bug (or an abstraction leak). In this case, there isn't any special handling later to fix those 0 values. The abstraction leak is that the behavior is different depending on whether or not the read version is known the time of setting the version stamped key. (Example below.)

I'm not too familiar with the code base. When the read version becomes known (TransactionState::readVersionFuture gets resolved), the implementation should go over SetVersionstampedKey mutations and adjust the unreadable ranges accordingly in the WriteMap. But retrieving a read version is abstracted away from ReadYourWritesTransaction. I guess callling atomicOp(SetVersionstampedKey) could set a flag on ReadYourWritesTransaction when the ranges need to be adjusted so the next read call can do that.

Or alternatively, maybe entries without a known key range should not be inserted to WriteMap in the first place. It honestly goes over my head. Like what happens if the same version stamped key is set before knowing the read version and after knowning a read version? Will there be two entries in the WriteMap - one with 0 and another with current read version?


Example:

// ok
test('setting a version stamped key before knowing the read version', async () => {
    await db.doTn(async tn => {
        await tn.getReadVersion();
        tn.setVersionstampedKeyRaw(Buffer.alloc(14), '');

        await tn.get(Buffer.alloc(14));
    });
});

// fails with "Read or wrote an unreadable key"
test('setting a version stamped key without knowing the read version', async () => {
    await db.doTn(async tn => {
        tn.setVersionstampedKeyRaw(Buffer.alloc(14), '');

        await tn.get(Buffer.alloc(14));
    });
});
atn34 commented 2 weeks ago

The abstraction leak is that the behavior is different depending on whether or not the read version is known the time of setting the version stamped key.

You're totally right. The semantics of this feature are quite weak - looks like the property tested in FuzzApiCorrectnessWorkload is just "it's always legal for a read to throw error_code_accessed_unreadable".

I think what we need is a (more) precise reference model for RYWTransaction behavior, and lighter-weight fuzz testing using that model. Currently the tests bring up a simulated fdb cluster, which limits fuzzing throughput. This would be a significant amount of work, so I don't want to block this PR on that.

The de facto workaround for this I believe is to insert entities with a versionstamped key in their own small transaction, and then interact with it in a followup transaction. Not very pretty.

atn34 commented 2 weeks ago

Actually, I think this might be close to what I was picturing: https://github.com/apple/foundationdb/blob/3c4a585fb0ee62d17eba2284242c470167c544d6/fdbserver/workloads/Unreadable.actor.cpp#L354.

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-clang-ide on Linux CentOS 7

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-macos on macOS Ventura 13.x

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-clang-arm on Linux CentOS 7

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr on Linux CentOS 7