Elfocrash / Cosmonaut

🌐 A supercharged Azure CosmosDB .NET SDK with ORM support
https://cosmonaut.readthedocs.io
MIT License
342 stars 44 forks source link

Using with Cosmos DB Bulk Executor #82

Closed countincognito closed 5 years ago

countincognito commented 5 years ago

First off, thank you for building and sharing such a terrific library! I'm making extensive use of it in our new system.

I understand from your blog posts that integrating the Cosmos DB Bulk Executor into Cosmonaut currently isn't possible due constraints around .NET Standard support, and the specific versions of Microsoft.Azure.DocumentDB.Core being used.

I'm currently experimenting with one of the .NET Standard preview nuget packages (2.3.0-preview2) for the Bulk Executor, combined with Cosmonaut 2.7.1 - the last version to use the necessary version of Microsoft.Azure.DocumentDB.Core - and I was wondering if you had any advice/tips/code samples for how to best use it when combined with Cosmonaut (I read somewhere that you had done some experiments yourself in the past)?

Right now I'm trying something similar to this (based on the article here):

        IList<Document> docs = inputRecords.Select(x => x.ToCosmonautDocument()).ToList();

        ConnectionPolicy connectionPolicy = new ConnectionPolicy
        {
            ConnectionMode = ConnectionMode.Direct,
            ConnectionProtocol = Protocol.Tcp
        };

        DocumentClient client = new DocumentClient(
            cosmosStore.CosmonautClient.DocumentClient.ServiceEndpoint,
            cosmosStore.CosmonautClient.DocumentClient.AuthKey,
            connectionPolicy);

        var dataCollection = await cosmosStore
            .CosmonautClient
            .GetCollectionAsync(cosmosStore.DatabaseName, cosmosStore.CollectionName);

        // Set retry options high during initialization (default values).
        client.ConnectionPolicy.RetryOptions.MaxRetryWaitTimeInSeconds = 30;
        client.ConnectionPolicy.RetryOptions.MaxRetryAttemptsOnThrottledRequests = 9;

        IBulkExecutor bulkExecutor = new BulkExecutor(client, dataCollection);
        await bulkExecutor.InitializeAsync();

        // Set retries to 0 to pass complete control to bulk executor.
        client.ConnectionPolicy.RetryOptions.MaxRetryWaitTimeInSeconds = 0;
        client.ConnectionPolicy.RetryOptions.MaxRetryAttemptsOnThrottledRequests = 0;

        var bulkImportResponse = await bulkExecutor.BulkImportAsync(
            documents: docs,
            enableUpsert: true,
            disableAutomaticIdGeneration: true,
            maxConcurrencyPerPartitionKeyRange: null,
            maxInMemorySortingBatchSize: null,
            cancellationToken: ct);

Thanks again!

Elfocrash commented 5 years ago

Hello @countincognito, thanks for your kind words.

I did, like 6 months ago. Your code looks fairly similar to what I was trying to do. Technically speaking, I could just bump Cosmosnaut's requirement to .NET Standard 2.0 and just use the bulk executor without breaking any consumers. The problem is that the bulk executor API fairly different from what Cosmonaut is doing so I would have to adapt the Bulk executor to Cosmonaut, but I am also waiting on the 3.0 release of the new CosmosDB SDK to see if I will move Cosmonaut to that. There are so many "ifs" hanging atm, that's why I'm not gonna integrate it just yet. I do however think that at some point I will definitely add the bulk executor in.

I am currently working on a separate optimization task for Cosmonaut.

countincognito commented 5 years ago

Hello @Elfocrash , thanks for the reply.

Yes, I can see why you would approach it that way. In found myself with a similar question - in the end I basically extended the functionality on our system to incorporate a "bulk" category of behaviours, essentially to distinguish it from the usual behaviour of Cosmonaut, just so people would know that the underlying functionality was different/special.

Just out of interest, did you find a reliable way to reuse the DocumentClient in the CosmosStore for bulk imports? My concern around that was messing around with the retry options (i.e. setting everything to 0) might create unpredictable behaviour for the other API calls, since the underlying instance is a shared singleton. On the flip-side, I wasn't keen to have to create a separate DocumentClient for each individual call (as in the example above), as that's against recommended best practice.

Elfocrash commented 5 years ago

In Cosmonaut each CosmosStore has their own instance of DocumentClient. This is done because different CosmosStores might need different configs on things like serialisation etc.

You can however, create a new DocumentClient and just store than in memory and reuse it.