Deferred Value Checks as an alternative for caching metadata

The current way to implement caching in some layers is to use the \xff/metadataVersionKey as a quick way to invalid any previously constructed cache in memory. Though this method as some drawbacks that may or may not be easy to work around.

To implement caching, the goal is usually to not pay the cost or reading one or more "metadata" keys, that change very infrequently, but are required to perform other reads or writes, thus introducing incompressible latency to all transactions. The goal is to be able to predict if previously observed values are still valid or not, without having to wait for the read to complete before executing the transaction.

The \xff/metadataVersion key attempts to move the check at the start of the transaction, by merge the cost of reading this key with the cost of obtaining the read version of the cluster (which cannot be omitted anyway). The issue is that this key is global for all layers, and also there are some cases where data can be mutated without changing the value of this key.

Deferred "value checks" is another way to reduce latency, by leveraging the "optimistic" nature of foundationdb's transaction: any transaction can start an asynchronous read of a key, without waiting for its result, before going ahead and running the transaction using the hypothesis that the value will be equal to some expected value. Before commit, all these reads will be awaited, and if any one of these read return a value that is different than the expected value, then a not_commit exception is simulated and thrown, preventing the transaction to commit.

In the next retry attempt, code in layer implementation can use an API to check which value-check failed in the previous attempt.

Use of this API can be a little bit tricky, because it needs to be coordinated across multiple retries of a transaction retry-loop, and if handled incorrectly can induce infinite retries (until the timeout or retry-limit count is reached).

The API could look like:

void tr.Context.AddValueCheck(string tag, Slice key, Slice expectedValue): add a check that will prevent the transaction to commit if the value of key is not equal to expectedValue). Thetag` can be used by the layer code to distinguish which key check failed (if multiple layers touched the same transaction)
bool? tr.Context.ValueCheckFailedInPreviousAttempt(string tag): used to query if a specific value check has failed (or succeeded) in the previous attempt.

The typical use would look like:

async Task SomeLayerMethod(IFdbTransaction tr, ....)
{
     CacheContainer? cachedMetadata = .....;
     if (cachedData != null)
     { // we have to re-validate the cache!
        if (true == tr.Context.ValueCheckFailedInPreviousAttempt("acmeLayerId"))
        { // we know from the previous attempt that this has changed!
            cachedMetadata = null; // drop the cache
        }
        else
        { // optimistically use the cached data, but add a value-check for this transcation.
            tr.Context.AddValueCheck("acmeLayerId", cachedMetadata.CheckKey, cachedMetadata.CheckValue);
        }
     }

     if (cachedMetadata == null)
     {
        cachedMetadata = await GetMetadata(tr, ...); // reload metadata from the db
        // keep this around _only_ if the transaction commits with success
        tr.Context.OnSuccess((ctx, _) => { /* store the cached metadata somewhere */ });
     }

     // use "cachedMetadata" instance to perform the transaction
     tr.Set(.....);
     await tr.GetAsync(....);
}

This method is not vulnerable to cases where the data is mutated without changing the \xff/metadataVersion key, and will also not be impacted by "noisy neighbors", but implementation must be very carefully done!

Doxense / foundationdb-dotnet-client

Deferred Value Checks as an alternative for caching metadata #103