Azure / azure-cosmosdb-java

Java Async SDK for SQL API of Azure Cosmos DB
MIT License
54 stars 61 forks source link

Reliable connection configuration for Azure functions consumption plan #357

Open casper-79 opened 4 years ago

casper-79 commented 4 years ago

Describe the bug We are building a serverless execution pipeline for analytics jobs on Azure functions consumption plan. The system stores and updates the state of its jobs in cosmosDB, which means we need strong consistency and synchronous writes. The job workload is a scheduled spike of jobs every hour, which we want to handle as quickly as possible. This implies lots of short lived consumption plan instances (100+) are spinning up at the same time and we found that the default cosmosDB client settings are not suitable for this scenario.

We have done load testing experiments with the Java CosmosDB SDK 4.2.0 with different client configurations (client is cached). Based on these experiments we have drawn the following conclusions:

DirectConnectionConfig directConnectionConfig = DirectConnectionConfig.getDefaultConfig();
directConnectionConfig.setMaxConnectionsPerEndpoint(1);
directConnectionConfig.setMaxRequestsPerConnection(10);

return new CosmosClientBuilder()
        .endpoint(environment.endpoint)
        .key(environment.primaryKey)
        .consistencyLevel(ConsistencyLevel.STRONG)
        .directMode(directConnectionConfig)
        .endpointDiscoveryEnabled(false)
        .buildClient();

We have not been able to find any guidelines on configuring the cosmosDB client for this truly serverless use case, so we would very much appreciate your comments on our findings. If we could somehow completely avoid the DB exception found below, that would be great…

To Reproduce Create Java function app on consumption plan. Deploy http triggered function which writes and modifies small dummy objects in cosmosDB (less than 1 kb). Use SDK 4.2.0, default direct connection configuration, strong consistency, synchronous client and make sure to cache the client so it is only created once. Set cosmosDB RU to 10.000 in order to eliminate RU's as a bottleneck. Write a test that puts some load on the function (eg. 100 requests pr second). Observe how application insights is flooded with exceptions of the type seen below.

Exception while executing function: Functions.JobPersister Result: Failure Exception: IllegalStateException: RntbdServiceEndpoint({"id":3,"isClosed":true,"concurrentRequests":0,"remoteAddress":"cdb-ms-prod-westeurope1-fd20.documents.azure.com:14318","channelPool":{"remoteAddress":"cdb-ms-prod-westeurope1-fd20.documents.azure.com:14318","isClosed":false,"configuration":{"maxChannels":130,"maxRequestsPerChannel":30,"idleConnectionTimeout":0,"readDelayLimit":65000000000,"writeDelayLimit":10000000000},"state":{"channelsAcquired":0,"channelsAvailable":0,"requestQueueLength":0}}}) is closed Stack: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.microsoft.azure.functions.worker.broker.JavaMethodInvokeInfo.invoke(JavaMethodInvokeInfo.java:22) at com.microsoft.azure.functions.worker.broker.JavaMethodExecutorImpl.execute(JavaMethodExecutorImpl.java:54)

Expected behavior It should be documented how to use cosmosDB with Azure functions consumption plan in a way that does not produce client exceptions.

Actual behavior We found that the default client configuration does not work reliably with Azure functions consumption plan and were unable to find documentation covering this usecase. We have found a client configuration which works reasonably well, but there are still a large number of exceptions polluting our logs. We would appreciate guidance on how to proceed or documentation covering our use case.

casper-79 commented 4 years ago

It seems the active repo for the cosmosDB client has been moved to https://github.com/Azure/azure-sdk-for-java.