Open vasantteja opened 7 months ago
@kushagraThapar please help take a look
@vasantteja thanks for raising this issue, usually bulk APIs should take lesser time and achieve high throughput when writing to Cosmos DB, however, it also depends on the distribution of the data that is being inserted, number of cores on the processor (on Azure VM in your case), and some other factors. You can read more about it here - https://learn.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview
However, that being said, in general it should be more efficient. @trande4884 - can you please take a look at this and see if there is any perf issue in our azure-spring-data-cosmos SDK?
@vasantteja what tool are you using to pull those execution times?
Thanks @kushagraThapar for the inputs. @trande4884 I was using StopWatch from org.springframework.util.StopWatch
to time this method. The pseudocode is as follows:
StopWatch timed = new StopWatch(); timed.start(); repository.saveAll(object); timed.stop();
@trande4884 Hi Trevor! Is there anything I can do to enhance the performance or is my process evaluating the time taken for this method wrong?
@vasantteja that framework seems to only be testing the runtime of the function, and not the execution time of the saveAll() query itself. Some of the additional time is likely the setup of the bulk operation but I have not had time to investigate yet, I'm hoping to have time to get to this next week to compare actual execution times.
In our readme there is information on setting up query metrics that would better track actual execution times and RU's. Here is more documentation: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/query-metrics
@trande4884 Thanks for the update. I will take a look at the documentation.
@trande4884 @kushagraThapar I have a small dumb question. Our saveAll query runs only once a week. In this case can we persist the connection using a parameter? If the connection is persisted I am assuming we will have a faster insertion times as we will do away with setUp of bulk operation.
@vasantteja - you cannot persist the connection. Connections created by Cosmos DB SDK are short lived. Even if you extend the connection timeout value to a higher value like a week, there is high chance of that connection getting dropped because of any movement of machines on the backend service or network blips. Machines restart all the time because of system and security updates. So it won't work.
However, in this regard, you can use a feature called as proactiveConnection
Management. This will allow you to create connections upfront to all your partitions, and you can also maintain a healthy active connection throughout the application lifecycle. If for some reason the connection gets dropped, SDK will re-create the connection instantly.
You can leverage this class CosmosContainerProactiveInitConfig
while creating CosmosClient
through spring.
This is the API in CosmosClientBuilder
->
/**
* Sets the {@link CosmosContainerProactiveInitConfig} which enable warming up of caches and connections
* associated with containers obtained from {@link CosmosContainerProactiveInitConfig#getCosmosContainerIdentities()} to replicas
* obtained from the first <em>k</em> preferred regions where <em>k</em> evaluates to {@link CosmosContainerProactiveInitConfig#getProactiveConnectionRegionsCount()}.
*
* <p>
* Use the {@link CosmosContainerProactiveInitConfigBuilder} class to instantiate {@link CosmosContainerProactiveInitConfig} class
* </p>
* @param proactiveContainerInitConfig which encapsulates a list of container identities and no of
* proactive connection regions
* @return current CosmosClientBuilder
* */
public CosmosClientBuilder openConnectionsAndInitCaches(CosmosContainerProactiveInitConfig proactiveContainerInitConfig) {
this.proactiveContainerInitConfig = proactiveContainerInitConfig;
return this;
}
Query/Question I updated my spring-data-cosmos jar from 5.3.0 to 5.8.0 version. I was excited about this as saveAll function was using bulk api under the hood when we are trying to write more than 1 record. I was under the assumption that saveAll will take less time than before as we are using bulk api. But it was opposite and bulk api took more time to update the records in cosmos than the non-bulk one. I have two questions.
We are running our apps on VMs hosted on Azure.
I am attaching the runtimes below:
Without bulk api(Using 5.3.0 version) Records-9 First-1251ms Second-992ms
With bulk api(Using 5.8.0 version) Records-9 First-1473ms Second-989ms
Why is this not a Bug or a feature Request? I am trying to understand the behavior of new version of the api. Right now this is not either impacting or does not require adding any new feature.
Setup (please complete the following information if applicable):