Doxense / foundationdb-dotnet-client

C#/.NET Binding for FoundationDB Client API
BSD 3-Clause "New" or "Revised" License
149 stars 33 forks source link

The documentation is outdated #97

Open xJonathanLEI opened 5 years ago

xJonathanLEI commented 5 years ago

The library works great, but the documentation in README.MD is horribly outdated. The example code is simply broken. Takes a while for a beginner to start making transactions.

KrzysFR commented 5 years ago

Sorry about the state of the documentation. The project was archived in 2015 when Apple acquired fdb, and all ongoing projects were canceled. I haven't had a lot of time to dedicate to the documentation since it has been restarted last year.

I created a new GitHub Project to start tracking future progress on a real documentation.

Could you tell me what exactly is broken in the sample so that I can at least fix it?

xJonathanLEI commented 5 years ago

Thanks for the reply. So Apple basically killed fdb...

The example code doesn't compile. For example:

var location = await db.Directory.CreateOrOpenAsync("Test", cancel);

This line is broken as CreateOrOpenAsync() doesn't accept a string and a cancellation token.

Another example:

The library requires a call to Fdb.Start() before doing anything to the database. That wasn't documented in README.MD or included in the sample code.

I just saw the FoundationDB.Samples project and the code inside should work just fine. But I guess new users tend to look at README.MD first.

Thanks for your effort in maintaining the library. I'd love to help out but since I'm new to fdb I doubt I can make any meaningful contributions lol.

KrzysFR commented 5 years ago

Yes you are right, the recent refactoring of the Directory Layer broke the readme (I had to update our codebase due to the same change).

You can contribute by pointing out things that are missing or that are hard to understand :)

KrzysFR commented 4 years ago

Most the API related to directory subspaces has been changed recently, sooo most of the documentation needs to be updated anyway :(

All the samples in the project have been updated to the new API, but there is still the need for some documentation that explains the intended uses of directory location and subspaces, and the best practices to write safe and efficient layers (especially given the restrictions when using cached directory subspaces).

sandtreader commented 3 years ago

Struggling with the same thing, I'm afraid, particularly around the Directory layer...

First a sanity check - is the code that's in Nuget (6.2.0-preview) derived from 'master' branch? I couldn't see any branches or tags that would indicate otherwise, but errors are making me doubt it.

As @xJonathanLEI reported, the code in the readme to create a subspace doesn't compile:

var location = await db.Directory.CreateOrOpenAsync("Test", cancel);

But neither does the (rather nice!) db.Root-based version in the sample code:

this.Subspace = await db.ReadWriteAsync(tr => db.Root["Samples"]"MessageQueueTest"].CreateOrOpenAsync(tr), ct);

Error: 'IFdbDatabase' does not contain a definition for 'Root' ...

The signature in the 'master' code (iFdbDirectory.cs) is

Task<FdbDirectorySubspace> CreateOrOpenAsync(IFdbTransaction trans, FdbPath subPath);

But I can't create an FdbPath in the Nuget version.

So TL;DR: What's the correct way to create a subspace in the version 6.2 in Nuget? :-)

Happy to try to fix up at least the README if we can work out the right answer!

Many thanks

Paul

sandtreader commented 3 years ago

OK, some progress - I've found there is a 5.2.1-preview2 available in Nuget and a matching tag here, so:

<PackageReference Include="FoundationDB.Client" Version="5.2.1-preview2" />

and this means this version (from the 5.2.1 samples) now compiles:

        this.subspace = await db.ReadWriteAsync(
          tr => db.Directory.CreateOrOpenAsync(tr,
                                              new [] { "foo", "bar" }), ct);

However I now get a runtime abort with REMOVED FDB API FUNCTION - I'm guessing that's coming from the foundationdb-client library itself? I'm building 6.3.19 - I'm assuming your versions follow the FDB ones, so it looks like I'd have to back that off to 5.2 as well...

... which is possible, but a solution for 6.2 would be much nicer, obviously!

Thanks again

KrzysFR commented 3 years ago

Sorry about that, yes the nuget package is outdated and is before the change in the subspace API.

We always build from source in our applications, so we did not need the nuget packages to be up to date. I'll try to release a newer version of the packages and tools soon.

The current usage pattern is to create an FdbDirectorySubspaceLocation from db.Root which represents a list of path (basically a list of path segments). This can be cached or stored in a static variable somewhere. Inside a transaction, you call the location.Resolve(tr) to get an actual subspace instance, which is only valid inside the transaction. Any attempt to use it outside will fail. The resolve method will internally try to look-up in a cache, while at the same time adding a deferred value check that will fail the transaction during commit if it realizes that the directory was changed by another process.

This pattern of having a "Resolve" method that must be called within a transaction is the new caching pattern for all layers (the Directory Layer is one of them).

sandtreader commented 3 years ago

Thanks Cristophe!

I see the 7.0.0-preview1 is up on nuget now - I'll try it.

Would it be possible to tag the repo for 6.2.0-preview1 so if people do use that they can find the right source version? I think it would be here:

https://github.com/Doxense/foundationdb-dotnet-client/tree/1e0c453f6a24276280c869cc92af86b2c1bc5d5c

I like the subspace caching! I was wondering about the performance hit of creating them.

Kind regards

Paul

KrzysFR commented 3 years ago

I have published updated nuget packages and created a new release with the tools binaires (win64 .NET 5.0)

This is still marked as "preview", because even though it has version 7.0, I still haven't included all API changes from fdb 7.x in the binding.

Though it is build from the exact same branch as what we use in production (though we build from source and do not use the nuget package directly)

KrzysFR commented 3 years ago

Would it be possible to tag the repo for 6.2.0-preview1 so if people do use that they can find the right source version? I think it would be here:

Sure, but I think the only people who used it before are building from source (with some customization).

I like the subspace caching! I was wondering about the performance hit of creating them.

That's the main reason for the complete redesign of the API. We use layers extensively, and usually all intermixed together in the same transaction, so caching is a real issue that must be handled at the lowest possible level. Right now the deferred value checks are emulated at the binding level, as a compromise, so there is still a bit of overhead.

The new caching API is expected to be able to be composable, so if you have written a layer that use caching internally (for example metadata of a pseudo-table or document collection) then this layer must also interact with the Directory Layer which has its own cache!

shaddeykatri commented 1 year ago

Sorry about that, yes the nuget package is outdated and is before the change in the subspace API.

We always build from source in our applications, so we did not need the nuget packages to be up to date. I'll try to release a newer version of the packages and tools soon.

The current usage pattern is to create an FdbDirectorySubspaceLocation from db.Root which represents a list of path (basically a list of path segments). This can be cached or stored in a static variable somewhere. Inside a transaction, you call the location.Resolve(tr) to get an actual subspace instance, which is only valid inside the transaction. Any attempt to use it outside will fail. The resolve method will internally try to look-up in a cache, while at the same time adding a deferred value check that will fail the transaction during commit if it realizes that the directory was changed by another process.

This pattern of having a "Resolve" method that must be called within a transaction is the new caching pattern for all layers (the Directory Layer is one of them).

Hi, I am also struggling with something regarding subspaces.

So currently I am using IFdbDatabase to create an object of db and while adding an entry I am creating a DirectorySubspace using

IFdbDatabase db; var subspace = await db.DirectoryLayer.CreateOrOpenAsync(transaction, path);

Now what this does is create a directory subspace, Is there any way I can use KeySubspace with the IFdbDatabase? If not how do I implement a KeySubspace or a DynamicKeySubspace inside the database? A small direction would help greatly.

KrzysFR commented 1 year ago

I recently updated the Readme of this repository with a more up to date example. The best way to use fdb with a modern .NET core/6/7/8 is to inject an IFdbDatabaseProvider with the DI, so that you can inject this service in any of your pages/services/workers, and then use it as if it was a db instance.

When using paths, you can call the `Resolve(..) method inside a transaction handler to get back the corresponding subspace. This was changed because, before, people tended to cache the DirectorySubspace instance for the duration of the process which was INCORRECT: deleting and re-creating a subdirectory would change its prefix, but pre-existing process would keep reading/writing from the old location, inducing corruption (this was very frequent during unit-testing).

So to repeat the given example:

In your startup logic (either old-school with Startup.cs or directly from your Program.cs):

public sealed class BookOptions
{

    /// <summary>Path to the root directory subspace of the application where all data will be stored</summary>
    public FdbPath Location { get; set; } // ex: "/Tenants/ACME/MyApp/v1"

}

// ...

builder.Services.Configure<BookOptions>(options =>
{
    // note: this would be read from your configuration!
    options.Location = FdbPath.Relative("Tenants", "ACME", "MyApp", "v1");
});

And then in a typical razor page:

public class BooksModel : PageModel
{

    public BooksModel(IOptions<BookOptions> options, IFdbDatabaseProvider db)
    {
        this.Options = options;
        this.Db = db;
    }

    private IFdbDatabaseProvider Db { get; }

    private IOptions<BookOptions> Options { get; }

    public async Task OnGet(string id, CancellationToken ct)
    {
        Slice jsonBytes = await this.Db.ReadAsync((IFdbReadOnlyTransaction tr) =>
        {
            // get the location that corresponds to this path
            var location = this.Db.Root[this.Options.Value.Location];

            // "resolve" this location into a Directory Subspace that will add the matching prefix to our keys
            var subspace = await location.Resolve(tr);

            // use this subspace to generate our keys
            Slice value = await tr.GetAsync(subspace.Encode("Books", id));

            // ....

        }

        // ...

    }

}

The idea is that you device your paths (instances of FdbPath or derived types) in your startup logic, probably by reading the base path from your app settings, environment variable or any other solution that the devops can configure, and then once you are inside a transaction handler, you exchange these paths into actual key subspaces by calling the Resolve method. These subspaces are only legal to use inside the transaction handler that resolved it, to ensure that if an admin does some maintenance to the Directory Layer, the next transaction will see the change and give you a subspace that points to the new location.

Under the hood, the Directory Layer uses the caching layer to ensure that you don't pay the extra latency of querying the directory layer for each transaction, instead it uses the metadata version key as well as deferred value-checks to ensure that the cached prefix is still valid. If the directory layer has changed in the background, the transaction will conflict and retry as if it was a regular conflict or transient failure.

There is one big difference you need to know: the Resolve method is equivalent to OpenAsync on the directory layer, meaning it will not attempt to create the directory if it does not exist! This change has been made because in the past, simply starting a misconfigured process in a kubernetes cluster somewhere would automatically create tons of invalid directories.

The current way is to either add logic in the startup procedure of the process, that will initialize the directory folders you need, and probably run some special logic to already populate some keys. This logic could also be externalized in a script or tool that is used to initialize a new tenant, during deployment.

This means that a rogue process with invalid settings would fail, instead of spamming the cluster with invalid data.

shaddeykatri commented 1 year ago

Hi, Thanks for the immediate response it helped me in the estimation of the future issues that I might face and the usage of DI is what I had planned for later.

However, my current issue still is that during the development phase of my application I want the whole DB content dump to use subspaces as it's easier to read the content of the db when using subspaces instead of directories.

And the code you suggested in the previous comment var subspace = await location.Resolve(tr) will return a directory subspace.

I see there are many different implementations of subspaces under /FoundationDB.Client/Subspaces. I just want a way to use them to create a subspace inside my directory.

In the go implementation there is a way to create a subspace using subspace = dir.Sub("users").

Is there a similar adaptation of this to create a subspace inside a directory? Or We can only create subdirectories inside the directories?

KrzysFR commented 1 year ago

My experience while developing layers is that using Directories is easier because I can use the FdbShell tool (in this repository) to browse and explore the content of the cluster! Also, you can use the logging feature of the transactions to generate a text log of each transaction with accurate details.

Regarding the different types of subspace, they all are equivalent in that they all add the binary prefix. Their differences is in regard to static typing (or not) purely for C#. There are two families, the "Dynamic" subspaces (and by default a DirectorySubspace is dynamic) that encode/decode Tuples of any shape. This is great for initial prototyping, because you can easily tweak and modify the format of your keys. But it is easy to make a mistake (decoding and encoding path that use different types and/or number of items).

If you want more type safety, you can use the TypedKeySubspace<T1, T2, ...> variants which encode the number of items and their types, and make it easier to use.

The type of subspace returned by Resolve will match the type of the ISubspaceLocation variant that you used, so usually in the constructor of your layer or service, you receive a generic "untyped" Location, and "retype" it to what you want with location.AsDynamic() or location.AsTyped<string, bool, int>() and store that as a property. Then when you Resolve for example a TypedKeySubspaceLocation<string, bool, int>, you will get a matching TypedKeySubspace<string, bool, int> with a pair of Slice Encode(string, bool, int) and (string, bool, int) Decode(Slice) methods.

Here is a snippet of one layer that uses this strategy:

    /// <summary>Layer that generates a change feed</summary>
    [DebuggerDisplay("Location={Location}")]
    public class FdbChangeFeedProducer<TFeedId, TMessage> where TMessage : class
    {
        public FdbChangeFeedProducer(IFdbDatabaseScopeProvider dbProvider, ISubspaceLocation location, IValueEncoder<TMessage>? encoder = null)
        {
            this.Db = dbProvider;
            this.Location = location.AsTyped<TFeedId, VersionStamp>();
            this.Encoder = encoder ?? CrystalJsonCodec.GetEncoder<TMessage>(); //TODO: json settings? ou JsonB?
        }

        public IFdbDatabaseScopeProvider Db { get; }

        public TypedKeySubspaceLocation<TFeedId, VersionStamp> Location { get; }

        public IValueEncoder<TMessage> Encoder { get; }

                public async Task WriteMessageAsync(IFdbTransaction tr, TFeedId feed, TMessage message, ....)
                {
                var subspace = await this.Location.Resolve(tr); // returns a ITypedKeySubspace<TFeedId, VersionStamp>
            if (subspace == null) throw new InvalidOperationException($"Location '{this.Location}' referenced by Change Feed Layer was not found.");

            var bytes = this.Encoder.EncodeValue(message);
            tr.SetVersionStampedKey(this.Subspace.Encode(feed, tr.CreateUniqueVersionStamp()), bytes);

                        //...
                }

                // .....
}

The layer gets its root location as a parameter, but recasts it into a typed location using location.AsTyped<....> which simply points to the same path, but adds the generic signature for keys that are a pair of a feed id (which is also generic but usually a string, int, or guid), and the VersionStamp that is the unique ID of a new message.

This layer also wants an IValueEncoder<T> which is used to customized HOW the values are encoded (could be using JSON, Protobuf, MessagePack, your own format, etc..)

The same layer could be implemented with location.AsDynamic() which returns an IDynamicSubspaceLocation instead, with a Slice Pack(tuple) method, as well as a set of Encode(...), 'Encoder<T1, T2>(....) extension methods. (Look in DynamicKeyExtensions class to see all the extension methods).

All this aspect of the API is probably specific to the C# binding because other languages either don't have static types (javascript, python) or did not implement it at the binding layer (Java?).

KrzysFR commented 1 year ago

Is there a similar adaptation of this to create a subspace inside a directory? Or We can only create subdirectories inside the directories?

If by "create a subspace" you mean adding an extra suffix to the directory's prefix then yes when you have any type of subspace, you can use one of the subspace.Partition.ByXYZ(...) methods, which return a new subspace with the extra suffix added to it.

In practice I very rarely use it, because usually these subspaces can be represented as integer constants in the code, and I simply add them as the first arguement to the Encode methods.

So I would get something like:


public class SomeLayer
{
      const int SUBSPACE_DOCUMENTS = 0;
      const int SUBSPACE_INDEX_BY_FOO = 1;
      const int SUBSPACE_INDEX_BY_BAR = 2;

     public SomeLayer(ISubspaceLocation location, ...)
     {
          this.Location = location.AsDynamic();
     }

     private IDynamicSubpsaceLocation Location { get; }

     public async Task WriteSomeDocument(IFdbTransaction tr, TDocument document)
     {
         var subspace = await this.Location.Resolve(tr);
         tr.Set(subspace.Encode(SUBSPACE_DOCUMENTS, doc.Id), this.Encoder.Encode(document));
         tr.Set(subspace.Encode(SUBSPACE_INDEX_BY_FOO, doc.Foo, doc.Id), Slice.Empty);
         tr.Set(subspace.Encode(SUBSPACE_INDEX_BY_BAR, doc.Bar, doc.Id), Slice.Empty);
     }

}

This works well in practice. Here I used integers, but you could use string or anything else.

If you really want to physically separate the subspace where the document bodies and indexes are stored, then you can rewrite this by creating three "sub-locations", and resolving each of them to get the corresponding subspace... A bit more work and usually, at least during initial prototyping, it is not required.