Azure / azure-cosmosdb-js-server

The JavaScript SDK for server-side programming in Azure Cosmos DB
MIT License
178 stars 154 forks source link

bulkImport.js importing different amount depending on how large I make the batches #11

Open jakehockey10 opened 7 years ago

jakehockey10 commented 7 years ago

Hello,

I'm using the stored procedure bulkImport.js to import C# collection of objects I've just created (in the thousands). If I use the following, I get a different number of documents inserted into the collection at the end of the process depending on the batchSize.

List<IEnumerable<Thing>> batches = things.Batch(batchSize).ToList();
await Client.CreateStoredProcedureAsync(
    UriFactory.CreateDocumentCollectionUri(Database, Collection),
    new StoredProcedure
    {
        Id = "UploadThings",
        Body = File.ReadAllText(@".\StoredProcedures\bulkImport.js")
    });
foreach (IEnumerable<Thing> batch in batches)
{
    submitted += await Client.ExecuteStoredProcedureAsync<int>(
        UriFactory.CreateStoredProcedureUri(Database, Collection, "UploadThings"), batch);
}

For instance, with about 7,700 documents being inserted with just one batch, I got 20 of them to actually get into the collection. So I tried batches of 3000 documents. I got a few thousand to make it to the collection. So on and so forth. I have a feeling that I'm missing something pretty fundamental here, but I can't think of what it is. Is there a way I can use the bulkImport.js in this repository in such a way that all the C# objects make it as documents into the collection?

alohaninja commented 7 years ago

This all depends on how many RUs you have, the bulkImport method describes how it works if you read into the comments... you need to continually call bulkImport, trimming the number of docs you send each time based upon the count returned that were successfully processed/accepted.

The Batch Size you send in doesn't change how much your DocDB instance allows - it's all based upon RUs (request units) which are returned in the HTTP headers (x-ms-request-charge).

        // If the request was accepted, callback will be called.
        // Otherwise report current count back to the client, 
        // which will call the script again with remaining set of docs.
        // This condition will happen when this stored procedure has been running too long
        // and is about to get cancelled by the server. This will allow the calling client
        // to resume this batch from the point we got to before isAccepted was set to false