keijokapp commented 1 month ago

As explained in this forum post, the range that gets marked as unreadable should start from the next possible read version, not the current one. That's because the final version stamp can't possibly start with the current read version. The +1 should be added to the read version if it's known.

The test script (JS):

import { beforeEach, test } from 'node:test';
import * as fdb from 'foundationdb';
import assert from 'node:assert';

fdb.setAPIVersion(720);

const db = fdb.open();

beforeEach(() => db.clearRange(Buffer.from([]), Buffer.from([0xff])));

// currently works
test('read from the range of the previous read version', async () => {
    await db.doTn(async tn => {
        const readVersion = await tn.getReadVersion();
        const previousReadVersion = Buffer.from((BigInt(`0x${readVersion.toString('hex')}`) - 1n).toString(16).padStart(16, '0'), 'hex');

        tn.setVersionstampedKeyRaw(Buffer.from([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /**/ 0, 0, 0, 0]), '');

        await tn.get(Buffer.concat([previousReadVersion, Buffer.from([0, 0])]));
        await tn.get(Buffer.concat([previousReadVersion, Buffer.from([0xff, 0xff])]));
    });
});

// will work when 1 is added to the start of the range that's marked unreadable
test('read from the range of the current read version', async () => {
    await db.doTn(async tn => {
        const readVersion = await tn.getReadVersion();

        tn.setVersionstampedKeyRaw(Buffer.from([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /**/ 0, 0, 0, 0]), '');

        await tn.get(Buffer.concat([readVersion, Buffer.from([0, 0])]));
        await tn.get(Buffer.concat([readVersion, Buffer.from([0xff, 0xff])]));
    });
});

// currently works
test('read at the next read version', async () => {
    const result = db.doTn(async tn => {
        const readVersion = await tn.getReadVersion();
        const nextReadVersion = Buffer.from((BigInt(`0x${readVersion.toString('hex')}`) + 1n).toString(16).padStart(16, '0'), 'hex');

        tn.setVersionstampedKeyRaw(Buffer.from([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /**/ 0, 0, 0, 0]), '');

        await tn.get(Buffer.concat([nextReadVersion, Buffer.from([0, 0])]));
    });

    await assert.rejects(result, 'Error: Read or wrote an unreadable key');
});

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

[ ] The PR has a description, explaining both the problem and the solution.
[ ] The description mentions which forms of testing were done and the testing seems reasonable.
[ ] Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

[ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
[ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

atn34 commented 1 month ago

Thanks! It looks like I can't merge it because required statuses haven't finished for some reason. @jzhou77 can you advise please?

jzhou77 commented 1 month ago

Thanks! It looks like I can't merge it because required statuses haven't finished for some reason. @jzhou77 can you advise please?

I closed and reopened this PR to kick the CI.

foundationdb-ci commented 1 month ago

Result of foundationdb-pr on Linux CentOS 7

Commit ID: aa32cc56a52ed58f40726bac30b0f8512da2d18a
Duration 0:04:03
Result: :x: FAILED
Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: aa32cc56a52ed58f40726bac30b0f8512da2d18a
Duration 0:04:05
Result: :x: FAILED
Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang-ide on Linux CentOS 7

Commit ID: aa32cc56a52ed58f40726bac30b0f8512da2d18a
Duration 0:04:04
Result: :x: FAILED
Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: aa32cc56a52ed58f40726bac30b0f8512da2d18a
Duration 0:04:08
Result: :x: FAILED
Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Commit ID: aa32cc56a52ed58f40726bac30b0f8512da2d18a
Duration 0:05:13
Result: :x: FAILED
Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

jzhou77 commented 1 month ago

Looks like code format issues. Please use clang-format version 15 to fix the issue.

keijokapp commented 1 month ago

Would it be okay to squash the commits together? Would be less noise.

atn34 commented 1 month ago

Yup, go ahead

On Thu, May 30, 2024 at 1:11 PM Keijo Kapp @.***> wrote:

Would it be okay to squash the commits together? Would be less noise.

— Reply to this email directly, view it on GitHub https://github.com/apple/foundationdb/pull/11424#issuecomment-2140789726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARO7B3CD5E3AFSRH657GO3ZE6BWXAVCNFSM6AAAAABIETZHNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBQG44DSNZSGY . You are receiving this because you commented.Message ID: @.***>

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang-ide on Linux CentOS 7

Commit ID: 38f5cd8cc84e3961fffdbd16a6f614a060cd35ed
Duration 0:21:13
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Commit ID: 38f5cd8cc84e3961fffdbd16a6f614a060cd35ed
Duration 0:35:09
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: 38f5cd8cc84e3961fffdbd16a6f614a060cd35ed
Duration 0:37:32
Result: :x: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr on Linux CentOS 7

Commit ID: 38f5cd8cc84e3961fffdbd16a6f614a060cd35ed
Duration 0:46:04
Result: :x: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: 38f5cd8cc84e3961fffdbd16a6f614a060cd35ed
Duration 0:47:35
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: 38f5cd8cc84e3961fffdbd16a6f614a060cd35ed
Duration 0:51:30
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang-ide on Linux CentOS 7

Commit ID: 3c4c3b7582aaa296bbbc4395b44058b9ecb9d311
Duration 0:21:25
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Commit ID: 3c4c3b7582aaa296bbbc4395b44058b9ecb9d311
Duration 0:35:43
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: 3c4c3b7582aaa296bbbc4395b44058b9ecb9d311
Duration 0:47:19
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: 3c4c3b7582aaa296bbbc4395b44058b9ecb9d311
Duration 0:47:48
Result: :x: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: 3c4c3b7582aaa296bbbc4395b44058b9ecb9d311
Duration 0:51:26
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci commented 1 month ago

Result of foundationdb-pr on Linux CentOS 7

Commit ID: 3c4c3b7582aaa296bbbc4395b44058b9ecb9d311
Duration 0:51:39
Result: :x: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

keijokapp commented 3 weeks ago

Amended with the following patch. Otherwise that key would be outside the overall range, making this specific key unreadable.

diff --git a/fdbclient/ReadYourWrites.actor.cpp b/fdbclient/ReadYourWrites.actor.cpp
index 070c3456f..25ffe8cf6 100644
--- a/fdbclient/ReadYourWrites.actor.cpp
+++ b/fdbclient/ReadYourWrites.actor.cpp
@@ -2273,7 +2273,7 @@ void ReadYourWritesTransaction::atomicOp(const KeyRef& key, const ValueRef& oper
        // k is the unversionstamped key provided by the user.  If we've filled in a minimum bound
        // for the versionstamp, we need to make sure that's reflected when we insert it into the
        // WriteMap below.
-       transformVersionstampKey(k, tr.getCachedReadVersion().orDefault(0), 0);
+       transformVersionstampKey(k, tr.getCachedReadVersion().map([](Version v) { return v + 1; }).orDefault(0), 0);
    }

    if (operationType == MutationRef::SetVersionstampedValue) {

atn34 commented 3 weeks ago

Nice catch. Did a simulation test find it? The simulation tests that failed in the CI run didn't seem directly related, and my not-very-confident understanding is that this should only show up as a key unexpectedly unreadable within the same transaction, so I'm kind of confused.

keijokapp commented 3 weeks ago

Found it by manual white-box testing while trying to solve an another (slightly related) issue.

Since I failed to find a solution to that other problem, it might be good to outline it here.

Applying an arbitrary default value to some state (ie tr.getCachedReadVersion().orDefault(0)) makes me naturally cautious. If the special case of applying the default value isn't balanced out by some special case handling later, then it's likely a bug (or an abstraction leak). In this case, there isn't any special handling later to fix those 0 values. The abstraction leak is that the behavior is different depending on whether or not the read version is known the time of setting the version stamped key. (Example below.)

I'm not too familiar with the code base. When the read version becomes known (TransactionState::readVersionFuture gets resolved), the implementation should go over SetVersionstampedKey mutations and adjust the unreadable ranges accordingly in the WriteMap. But retrieving a read version is abstracted away from ReadYourWritesTransaction. I guess callling atomicOp(SetVersionstampedKey) could set a flag on ReadYourWritesTransaction when the ranges need to be adjusted so the next read call can do that.

Or alternatively, maybe entries without a known key range should not be inserted to WriteMap in the first place. It honestly goes over my head. Like what happens if the same version stamped key is set before knowing the read version and after knowning a read version? Will there be two entries in the WriteMap - one with 0 and another with current read version?

Example:

// ok
test('setting a version stamped key before knowing the read version', async () => {
    await db.doTn(async tn => {
        await tn.getReadVersion();
        tn.setVersionstampedKeyRaw(Buffer.alloc(14), '');

        await tn.get(Buffer.alloc(14));
    });
});

// fails with "Read or wrote an unreadable key"
test('setting a version stamped key without knowing the read version', async () => {
    await db.doTn(async tn => {
        tn.setVersionstampedKeyRaw(Buffer.alloc(14), '');

        await tn.get(Buffer.alloc(14));
    });
});

atn34 commented 2 weeks ago

The abstraction leak is that the behavior is different depending on whether or not the read version is known the time of setting the version stamped key.

You're totally right. The semantics of this feature are quite weak - looks like the property tested in FuzzApiCorrectnessWorkload is just "it's always legal for a read to throw error_code_accessed_unreadable".

I think what we need is a (more) precise reference model for RYWTransaction behavior, and lighter-weight fuzz testing using that model. Currently the tests bring up a simulated fdb cluster, which limits fuzzing throughput. This would be a significant amount of work, so I don't want to block this PR on that.

The de facto workaround for this I believe is to insert entities with a versionstamped key in their own small transaction, and then interact with it in a followup transaction. Not very pretty.

atn34 commented 2 weeks ago

Actually, I think this might be close to what I was picturing: https://github.com/apple/foundationdb/blob/3c4a585fb0ee62d17eba2284242c470167c544d6/fdbserver/workloads/Unreadable.actor.cpp#L354.

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-clang-ide on Linux CentOS 7

Commit ID: 3c4a585fb0ee62d17eba2284242c470167c544d6
Duration 0:21:10
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Commit ID: 3c4a585fb0ee62d17eba2284242c470167c544d6
Duration 0:37:04
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: 3c4a585fb0ee62d17eba2284242c470167c544d6
Duration 0:46:18
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: 3c4a585fb0ee62d17eba2284242c470167c544d6
Duration 0:47:34
Result: :x: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-clang-arm on Linux CentOS 7

Commit ID: 3c4a585fb0ee62d17eba2284242c470167c544d6
Duration 0:49:15
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: 3c4a585fb0ee62d17eba2284242c470167c544d6
Duration 0:51:17
Result: :white_check_mark: SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci commented 2 weeks ago

Result of foundationdb-pr on Linux CentOS 7

Commit ID: 3c4a585fb0ee62d17eba2284242c470167c544d6
Duration 0:52:41
Result: :x: FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

apple / foundationdb

Fix the key range affected by setting version stamped key #11424

Code-Reviewer Section

For Release-Branches

Result of foundationdb-pr on Linux CentOS 7

Result of foundationdb-pr-clang on Linux CentOS 7

Result of foundationdb-pr-clang-ide on Linux CentOS 7

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Result of foundationdb-pr-clang-ide on Linux CentOS 7

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Result of foundationdb-pr-clang on Linux CentOS 7

Result of foundationdb-pr on Linux CentOS 7

Result of foundationdb-pr-macos on macOS Ventura 13.x

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Result of foundationdb-pr-clang-ide on Linux CentOS 7

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Result of foundationdb-pr-macos on macOS Ventura 13.x

Result of foundationdb-pr-clang on Linux CentOS 7

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Result of foundationdb-pr on Linux CentOS 7

Result of foundationdb-pr-clang-ide on Linux CentOS 7

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Result of foundationdb-pr-macos on macOS Ventura 13.x

Result of foundationdb-pr-clang on Linux CentOS 7

Result of foundationdb-pr-clang-arm on Linux CentOS 7

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Result of foundationdb-pr on Linux CentOS 7