Doxense / foundationdb-dotnet-client

C#/.NET Binding for FoundationDB Client API
BSD 3-Clause "New" or "Revised" License
147 stars 33 forks source link

IConfigurationProvider ? #136

Open dazinator opened 1 month ago

dazinator commented 1 month ago

Given Foundation is known for its ACID transactions accross the distributed cluster and its key-value nature, I thought perhaps it would naturally fit in to a dotnet application as an IConfiguration source.

For example, the application could update the configuration at runtime in foundation using a transaction, and that config would be highly available, and the new configuration could be pulled into all instances of the application as reloaded IConfiguration.

Has anyone thought much around developing an IConfiguration provider that leverages foundation - and would this be something that would or could fit within the confines of this project? Or would it be best reflected as a seperate github project that leverages this one as a dependency?

dazinator commented 1 month ago

Qualifier: I know practically nothing about FoundationDb or whether it is really suited for this sort of use case. I suspect its overkill to use it as a distributed config provider, but just thought I'd ask a probing question.

KrzysFR commented 1 month ago

Yes, if the complete list of key/values that would make up your configuration would be "small enough" (ideally less than 1 MB) then you could use FoundationDB as a repository for such a configuration, and be able to mutate/publish a new set of settings with ACID guarantees. There has even been a "port" of ZooKeeper using FoundationDB (see https://github.com/pH14/fdb-zk).

FoundationDB supports the concept of "watches" that allows your process to be automatically notified as soon as some keys have been changed in the cluster, which could be used as well to automatically reload a configuration at runtime.

The only potential problems to solve for such an IConfigurationProvider would be:

I'm not sure how you are supposed to deal with async configuration providers in .NET during build/DI stage of the application, but if you have a way to be able to await the read of the settings from the cluster, then you could write a simple wrapper that will read the keys from fdb, and expose them to the application.

Though, in my experience, potentially blocking the application during startup can be difficult to troubleshoot: logging may not be enabled yet, or even the OpenTelemetry provider may not be started so you would not see any logs coming from process that are blocking on the database.

As for your second question, this repository already includes several additional packages for generic and experimental layers, as well as support for Aspire. I could see the utility for such an IConfigurationProvider implementation, which could live in this repo as well.

KrzysFR commented 1 month ago

Qualifier: I know practically nothing about FoundationDb or whether it is really suited for this sort of use case. I suspect its overkill to use it as a distributed config provider, but just thought I'd ask a probing question.

If you are using fdb only for this, it would probably be overkill. Though if you need to setup a system specifically for such a task, then FoundationDB could be a good fit, depending on the scale and criticality of the system.

Though, once you have access to a FoundationDB cluster, you can use it for a lot of things, like row/document stores, indexing, pub sub, distributed queues, etc... You would even be able to combine all of these different shapes of data in the same transaction, which is impossible/very difficult to do when you are combining multiple different systems to achieve all of this.

dazinator commented 1 month ago

Thanks @KrzysFR Couple of points in case anyone looks at this in future.

The fdb API is async for all reads, meaning that any code that wants to pull data from the cluster will need to be async

I have the shell of an Async config provider over here: https://github.com/dazinator/Dazinator.Extensions.Configuration Its written in such a way that that all the boiler plate is written you just need to provide async Func's for the logic you are interested in, as well as func to return IChangeToken which can be used to signal async reloads.

The fdb cluster may not be available when the process is starting, which means that the application may retry indefinitely during startup until it can finally connect (or time out).

Yes the dotnet IConfigurationProvider picture is all synchronous. However as per the approach I have laid out above, to use ann aync provider you need to prime it at application startup with its initial values, before building the host. This is typically in your application entrypoint, and so you can wrap this with resilience (Polly) policies etc, and terminate the application after X retries etc. I think this would be acceptable to most who are relying on a non local config store.

KrzysFR commented 1 month ago

There is a potential issue with this approach when using fdb, is that if you want to easily integrate FoundationDB in a modern .NET application, you will probably inject an IFdbDatabaseProvider in the DI (via builder.Service..AddFoundationDb(...)), and also probably use Aspire to help start a local fdb cluster (using docker).

All of this needs a fully built IServiceProvider, as well as the proper environment variables to inject at least the connection string to the fdb cluster.

If you have to use FoundationDB "before" all of this, it means that you would need to replicate all this logic and call Fdb.Start(...) manually. The main issue being that, since the native fdb client library (written in C) is basically a static singleton, you can only "start" fdb once per process, unlike say a SQL database connection where you can create a new connection, or use a connection pool.

note: all FoundationDB bindings (Java, Rust, Python, Go, .NET, ...) are "just" wrappers around the native client library that handles all the work. In the case of .NET, the binding will P/Invoke into the native C lib and basically wraps all of this into Tasks.

A possible workaround to "asyncify" the IServiceProvider

I've had many design issues with the limitation of "no async" in the DI. The pattern I've ended up using, that works "good enough", is having an IFooProvider that has a ValueTask<IFoo> GetFoo(...) method, and handles all the async initialization logic.

The "FooProvider" will handle all the async loading of options/settings, as well as async initialization of the service, pre-loading of caches. The resulting instance - once fully initialized - is cached, so that all subsequent calls to var foo = await fooProvider.GetFoo(...) are fully optimized. The same provider could also hook into a "reload" signal to initialize and publish a new instance.

That's the approach I've taken with IFdbDatabaseProvider which has a ValueTask<IFdbDatabase> GetDatabase(CancellationToken ct) method. It will handle all the initial configuration and connection of the IFdbDatabase instance (which is the singleton that we wanted in the first place). It can also include a timeout and cancellation (so that HTTP request may return Service Unavailable while it is attempting to connect).

This would be more cumbersome to use, since you would always have to call GetDatabase(...), await it, and then call the ReadAsync(..) or ReadWriteAsync(...) method.

To help fix this, I've added extension methods on the IFdbDatabaseProvider to "hide" this fact. They will get the database instance and forward the call to this instance.

So, instead of

public async Task<...> OnGet(......, [FromService] IFdbDatabaseProvider dbProvider, Cancellationtoken ct)
{
    var db = await dbProvider.GetDatabase(ct);
    return await db.ReadAsync(tr => ....., ct);
}

you can do

public async Task<...> OnGet(......, [FromService] IFdbDatabaseProvider db, Cancellationtoken ct)
{
    return await db.ReadAsync(tr => ....., ct);
}

Using the same approach, you would need an IAsyncService<TService>, which would have a ValueTask<TService> GetService(Timeout], CancellationToken) method. Consumers of the service could then await the task to get a fully initialized instance, OR add a bunch of extension methods that hides the presence of this "hack". Once you have such a provider type, it can then use any logic it needs to fetch/subscribe to the remote configuration, and also is able to use other types from the DI.

Another solution that leverages FoundationDB

A last word of another solution to this problem, but specific to FoundationDB: when you create Layers with fdb and the .NET binding, you can use the mechanism of Deferred Value Checks, in order to cache any metadata that would be stored in the database itself.

For example, if you have a layer that emulates SQL Tables or Document Collections, each table or collection has a schema that is stored as metadata in the db, alongside the data, and this schema can change at any time, but ALL servers in the cluster MUST observe the schema change at the same time (if not, one server lagging behind would silently still insert data using the old schema, or fail to update a new index).

The solution is to use Deferred Value Checks as a mechanism to build a distributed cache: any new transaction will attempt to use the cached metadata from the previous call. If there is none, then it will have to read the metadata from the db (using any async reads as required). If there is already a cache value (in memory), it may still be valid, but it is possible that a transaction that committed a microsecond before just changed the metadata. To be robust, it would need to re-check the metadata (using a random token that changes on every edit) by reading the key, BEFORE being able to read any other data. This means that you need at least one round-trip to the db before being able to do any work.

Deferred Value Checks are a way for a transaction to "protect" itself against this, without incurring the initial latency cost. When you reuse the cache, you can issue the "check read", but don't need to await it. You can immediately start reading from the db using the cached schema, while the check read is still pending. When the transaction has to commit, any pending check will be awaited. All "check reads" expect a specific value in the db to still be equal to the value it observed previously. If any check fails (the value changed), then the whole transaction is retried, simulated a Read Conflict. On the next try, the cache mechanism will recognize that its data is stale, and will drop the cached data, prompting a full schema reload.

Once you have such a mechanism, it means that it is very easy to use "infrequently changing" data in your code, which is still guaranteed to be up to date, and with a way to automatically reload it as soon as it has changed. You are also guaranteed that all servers in the cluster will observe the change at the same time, and there will not be any server that lags behind.

dazinator commented 1 month ago

All of this needs a fully built IServiceProvider, as well as the proper environment variables to inject at least the connection string to the fdb cluster.

The main issue being that, since the native fdb client library (written in C) is basically a static singleton, you can only "start" fdb once per process

I think this could be solved by keeping foundation client / provider in its own DI container.

So in the entry point of the application:

KrzysFR commented 1 month ago

If you'd only intend to use FDB as a distributed IOptions<...> provider and nothing else, then you could simply define it on the initial service provider, and make sure its lifetime is as long as the application itself (or self the DisposeAsync will call Fdb.Stop which nukes the native client library handle)

But in my case, and probably most people using FDB, you will also need it for the rest of the application. Since the lifetime of the fdb client is a singleton, and since the instances requires other injected services, like ILogger, IClock, OpenTelemetry tracing/metrics context and so on, it would be difficult to reuse the singleton created from the initial IServiceProvider, since it would probably have use its own instances, which are not the same and most probably don't use the same settings (especially logging).

I really don't like the fact that the out-of-the-box DI container forces you in a corner like this. It has clearly been designed for static pre-defined configuration (coming from env variables or .json files).

There is maybe another way to work around this. The main issue is that, if you have Singleton services that are injected into other types like API controllers, and they need an IOptions<FooOptions> in the constructor, you don't initially know the correct value until you have been able to query the database, which could happen later or even never, well after the instance is accessible by the rest of the application.

Since you will need to support reloading of options at runtime anyway, it means that the values that were true in the constructor could change at any point, so the singleton would either need to defer looking at the options until an actual runtime method is called, OR have a way to reload it's internal state from some signal.

The trick would be to replace IOptionsMonitor<FooOptions> with something like IOptionsMonitor<Maybe<FooOptions>> (depending on libraries, may be called Maybe<T>, Option<T>, Some<..>/None/Error<T>, similar to Nullable<T> but also for classes). It seems like this will be easier at some point in the future once Discriminated Union Types make it to .NET (cf https://github.com/dotnet/csharplang/blob/main/proposals/TypeUnions.md )

This could work like this:

That way, the same config update mechanism would be used for the initial async load, as well as any subsequent updates.

The only thing that the implementers would have take care of, is the initial "None" state. This is very similar to using Nullable<T> and have to check HasValue before calling Value, which would throw an InvalidOperationException if null.

dazinator commented 1 month ago

There is maybe another way to work around this. The main issue is that, if you have Singleton services that are injected into other types like API controllers, and they need an IOptions in the constructor, you don't initially know the correct value until you have been able to query the database, which could happen later or even never, well after the instance is accessible by the rest of the application.

My understanding is that this is the flow of events:-

  1. IConfiguration is needs to available prior to usage of the Options system - i.e its primed on startup ahead of DI.
  2. Options system, has IOptionsSnapshot<T> and IOptionsMonitor<T> for config retreived per DI scope, and latest config respectively. IOptions<T> injected into a singleton would be bound to the IConfiguration values on first access and then never change. You'd inject the correct one into your singleton as a dependency based on the behaviour you need, however for a singleton IOptionsSnapshot is pretty much the same as IOptions due to the fact there is only a single scope.
  3. When the Options class is bound to IConfiguration values in each of the above scenarios, because the IConfiguration values are allready local - there is no need to do any async querying. I think this is the least complicated way to keep things and doesn't try to introduce async complexities into this subsystem that wasn't designed for it.
  4. Therefore how do we refresh config and cause new Options values to change? Well if we async reload the IConfiguration - it will fire a change token which the Options subsystem detects to invalidate its caches. This causes IOptionsMonitor and new IOptionsSnapshot and first time bound IOptions classes to bind against the new IConfiguration values - meaning this subsystems should just work as intended. This is how Json config reloads work for example.

So solving the async reload of the IConfiguration which I think sounds like its possible would fix those issues.

Since the lifetime of the fdb client is a singleton, and since the instances requires other injected services, like ILogger, IClock, OpenTelemetry tracing/metrics context and so on, it would be difficult to reuse the singleton created from the initial IServiceProvider, since it would probably have use its own instances, which are not the same and most probably don't use the same settings (especially logging).

This is a common issue in my experience. For example, logging arguable should be initialised first, prior to DI. Therefore I already break apart the entry point into initialising the ILoggerProvider / LoggerFactory first prior to building the host / main DI container.

When configuring the host, you can add your pre-initialised ILoggerProvider instead of allowing the host to create you one on the fly which you can't access until after the DI container is built. Logging has a dependency on IConfiguration to configure the logger. So this means we have to build some IConfiguration to control logging first.

Applying these kinds of principles my entry point resembles:

  1. Initialise IConfiguration for logging.
  2. Initialise ILoggerProvider using that IConfiguration.
  3. Log something - its vital logs work at the start of the process.
  4. Now build DI container. We use Microsoft Hosting - so build the host.
    • we can register our existing ILoggerProvider when configuring logging.

If we need to augment our applications configuration with one time values fetched from database we can do that as an async task to fetch some IConfiguration between steps 3) and 4). Note however that this fetch will want to log. Luckily we have logging initialised so this is no issue. It may also require its own IConfiguration so it knows the db connection string etc. We can build that IConfiguration explicitly.

Basically this approach establishes some subsystems, putting logging as the first subsystem to be established, and overlaying other critical or supporting subsystems, each can have its own IConfiguration in the mix.

Interestingly - we need configuration, in order to build the subsystem to provide additional configuration! I think this is ok.

Ayway - these are just ideas, and my personal opinions but in this model, if we also needed to share the same IClock or OpenTelemetry XYZ then this approach would mandated that you initialise it discretely and then incorporated into the DI containers that need it - and that incorporation journey is the thing that really pains me because you have to understand a lot about the ways of configuring those subsystems and how the lifetimes work so that you can register them correctly. Sometimes this is very simple - like with Logging and including a provider is easy. Sometimes this means creating new abstractions and such. I don't see any other generalised approach of handling this at present.

KrzysFR commented 1 month ago

How do you deal with third-party libraries that only accept an IOptions<T> and do not support dynamic options that can change at runtime? (would need to restart the process for these?)

If all your code uses IOptionsMonitor<..> and all your dependencies as well, then you are fine. But as soon as there is a single dependency that has static options, I'm not sure how you are supposed to deal with it, except restarting the process ??

My issue is that I am forced to add more and more external types that themselves require a ton of builders, options, providers, sinks etc... and the probability that at least one will not play nicely with dynamic options will approach 1 (if not already the case).

It almost looks like the most compatible and safe way would be to move the async option provisioning outside of the process: a bootloader process that would query the database, populate an appsettings.json file, and spawn the actual server process (which would read this json file to populate the IConfiguration).

The bootloader process would be the one in charge of watching the database for changes, and would simply write a new JSON file with the new configuration on disk.

The child process would be configured to automatically reload the config whenever the JSON file changes. Or if this is not possible anyway (at least one third-party dependency does not support this), the bootloader would kill/restart the process.

This really looks like re-inventing the wheel, and doing what Kubernetes would already do for you...

dazinator commented 1 month ago

How do you deal with third-party libraries that only accept an IOptions and do not support dynamic options that can change at runtime? (would need to restart the process for these?)

If they accept IOptions then the use case for IOptions is that its a one time value. The library author should be notified about the use case to support Options that can change at runtime, and therefore switch to IOptionsMonitor or if its a scoped dependency IOptionsSnapshot appropriate for such a use case.

If this was causing me issues, I'd have to comment on a concrete use case, but for example an extreme soluton might be that I need to build this library dependency in its on rebuildable DI Container, so I can .AddXyz() add it as normal to its own DI container, then I wrap it with my own factory abstraction, where I can detect Options changing (using IOptionsMonitor.OnChange), and then internally rebuild / rotate the DI container using the latest current Options to .AddXyz(). This new factory would most likely have too be asynchronous so that when requesting an instance of the dependency, if its mid container rebuild, we can await that operation.

dazinator commented 1 month ago

bootloader process that would query the database, populate an appsettings.json file, and spawn the actual server process (which would read this json file to populate the IConfiguration).

The bootloader process would be the one in charge of watching the database for changes, and would simply write a new JSON file with the new configuration on disk.

I see and appreciate the thinking here. It removes the need to use dotnet at all for the foundation integration and is more akin to microservices approach. It still requires we relay the changed state via the file system and this can be unreliable - due to things like locks with files being accessed, or non atomic writes (if the process dies half way through flushing to disk), ulimits for file watchers etc. Therefore I think this although appealing is not without its drawbacks. There are benefits to having the IConfigurationProvider in app directly pulling through IConfiguration into memory I think. Not to say this other way couldn't be made to work! ;-)

KrzysFR commented 1 month ago

It is clear that due to some limitations in the .NET way of doing DI, we have to choose which flavor of poison we have to drink, each with its own set of issues and limitations.

Most if not all of my processes are almost always stateless, with everything (including most of the configuration) stored in a FoundationDB keyspace, and using the caching mechanisms described above for the very "hot" data like schemas and config snapshots. I had to solve this issue well before the introduction of IServiceProvider and IOptions, so I guess I already have something that already works. The rest of the settings that would be required during startup end up being only things for the infrastructure part (where to log, where to send OTEL data, any credentials/secrets for these), which usually are handled on the hosting side (for ex Kubernetes) which already is responsible for the lifetime of the process anyway.

That's probably why I can live with a static IConfiguration that will not change during a process lifetime, and rely on the fact that restarts are very fast and have a limited global impact (if you have a pool of servers of course, different story if you have a single node!)

Anyway, if you are interested in how you could store a set of key/values into FDB, this can be done very easily with something similar to a "Map" layer, cf https://github.com/Doxense/foundationdb-dotnet-client/blob/master/FoundationDB.Layers.Common/Collections/FdbMap%602.cs.

If you look at the implementation, writing keys or reading a single key is trivial, the only trick is when reading all the keys in the same transaction (so that you end up with a coherent view of all the settings):

public async Task<TValue> GetAsync(IFdbReadOnlyTransaction trans, TKey id)
{
    // ...
    var data = await trans.GetAsync(this.Subspace[id]).ConfigureAwait(false);
    if (data.IsNull) throw new KeyNotFoundException("The given id was not present in the map.");
    return this.ValueEncoder.DecodeValue(data)!;
}

public void Set(IFdbTransaction trans, TKey id, TValue value)
{
    // ...
    trans.Set(this.Subspace[id], this.ValueEncoder.EncodeValue(value));
}

public IAsyncEnumerable<KeyValuePair<TKey, TValue?>> All(IFdbReadOnlyTransaction trans, FdbRangeOptions? options = null)
{
    // ...
    return trans
        .GetRange(this.Subspace.ToRange(), options)
        .Select(kv => DecodeItem(this.Subspace, this.ValueEncoder, kv));
}

The GetAsync and Set will read/write a single key. To make it generic, I'm using a key/value encoder abstraction that use any scheme required to store values, usually the Tuple Encoding for the keys, and JSON or Protobuf for the values. You could also handle encryption at this level.

There is a limit of 100kB maximum value size, so if you have a single value that is more than 100kB you have to split it. If the value is a JSON object, it could be best to explode it and store each fields separately ("obj.foo", "obj.bar", "obj.baz", "obj.foo[2].bar.baz", ...). Some people even explode objects into individual leaves (one entry per field, with the key being the full json path in the object) which may or may not be overkill.

The GetRange operator will read a stream with all the keys in the "subspace" that contains the map, in lexicographical order, which would be the equivalent of a SELECT key, value FROM configuration

This works fine, but there is a limitation of 5 seconds per transaction in FoundationDB, which indirectly limits the total number of bytes that you can read (depends on the network speed as well as global load in the cluster).

In theory, 5 seconds with a 1Gbps pipeline is already > 500 MB of data, and I don't think that you'd ever need a configuration that big (or we are using a different definition of "configuration" ! :) )

If you would ever need to store multiple configuration, each for a different tenant, or maybe you have different pools of server (prod1, prod2, staging, ...) you could simply have multiple different subspaces each with it's own set of key/value pairs, and the GetRange would only stream the keys from this specific subspace.

The Directory Layer (standard in all fdb bindings) allows you to split the keyspace into a hierarchy of "subspaces" which very similar to how a disk volume would be split into folders and subfolders. You could have a subspace location use the tenant id or server pool id as part of the "path" to the subspace that holds all the keys that are part of the same configuration.

This way, you don't have to do something like SELECT key, value FROM config WHERE tenantId = 'ACME' and srvId = 'SRV042', it would be as-if there was a different table (or even database) per tenant or server id. The foundationdb keyspace can be split almost indefinitely so you are not limited to the depth of your hierarchy (still limited to 1kB keys which is WAY too big anyway).