Describe the bug
We are building a serverless execution pipeline for analytics jobs on Azure functions consumption plan. The system stores and updates the state of its jobs in cosmosDB, which means we need strong consistency and synchronous writes. The job workload is a scheduled spike of jobs every hour, which we want to handle as quickly as possible. This implies lots of short lived consumption plan instances (100+) are spinning up at the same time and we found that the default cosmosDB client settings are not suitable for this scenario.
We have done load testing experiments with the Java CosmosDB SDK 4.2.0 with different client configurations (client is cached). Based on these experiments we have drawn the following conclusions:
Direct connection is the way to go, since it can be tuned to deliver much better performance than gateway. Significant tuning is required to make it reliable, however. If we simply use the default configuration for direct connection we are flooded with thousands of the “connection closed” DB exception seen below.
Limiting maxConnectionsPerEndpoint to 1 reduces the number of DB exceptions very significantly and does not appear to have any negative impact on performance
Setting a low value for maxRequestsPerConnection lowers the latency of synchronous writes very significantly (At least an order of magnitude for the 95th percentile compared to default). If it goes too low, however, we start to see an increase in the number of DB exceptions. The optimal parameter appears to be a trade off between performance and the number of DB exceptions thrown. We have found 10 be a good compromise.
Performance varies significantly with different versions of the SDK. In our experiments 4.3.1 provides less than half the query throughput of 4.2.0 using the connection configuration seen below.
We have not been able to find any guidelines on configuring the cosmosDB client for this truly serverless use case, so we would very much appreciate your comments on our findings. If we could somehow completely avoid the DB exception found below, that would be great…
To Reproduce
Create Java function app on consumption plan. Deploy http triggered function which writes and modifies small dummy objects in cosmosDB (less than 1 kb). Use SDK 4.2.0, default direct connection configuration, strong consistency, synchronous client and make sure to cache the client so it is only created once. Set cosmosDB RU to 10.000 in order to eliminate RU's as a bottleneck. Write a test that puts some load on the function (eg. 100 requests pr second). Observe how application insights is flooded with exceptions of the type seen below.
Exception while executing function: Functions.JobPersister Result: Failure Exception: IllegalStateException: RntbdServiceEndpoint({"id":3,"isClosed":true,"concurrentRequests":0,"remoteAddress":"cdb-ms-prod-westeurope1-fd20.documents.azure.com:14318","channelPool":{"remoteAddress":"cdb-ms-prod-westeurope1-fd20.documents.azure.com:14318","isClosed":false,"configuration":{"maxChannels":130,"maxRequestsPerChannel":30,"idleConnectionTimeout":0,"readDelayLimit":65000000000,"writeDelayLimit":10000000000},"state":{"channelsAcquired":0,"channelsAvailable":0,"requestQueueLength":0}}}) is closed Stack: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.microsoft.azure.functions.worker.broker.JavaMethodInvokeInfo.invoke(JavaMethodInvokeInfo.java:22) at com.microsoft.azure.functions.worker.broker.JavaMethodExecutorImpl.execute(JavaMethodExecutorImpl.java:54)
Expected behavior
It should be documented how to use cosmosDB with Azure functions consumption plan in a way that does not produce client exceptions.
Actual behavior
We found that the default client configuration does not work reliably with Azure functions consumption plan and were unable to find documentation covering this usecase. We have found a client configuration which works reasonably well, but there are still a large number of exceptions polluting our logs. We would appreciate guidance on how to proceed or documentation covering our use case.
Describe the bug We are building a serverless execution pipeline for analytics jobs on Azure functions consumption plan. The system stores and updates the state of its jobs in cosmosDB, which means we need strong consistency and synchronous writes. The job workload is a scheduled spike of jobs every hour, which we want to handle as quickly as possible. This implies lots of short lived consumption plan instances (100+) are spinning up at the same time and we found that the default cosmosDB client settings are not suitable for this scenario.
We have done load testing experiments with the Java CosmosDB SDK 4.2.0 with different client configurations (client is cached). Based on these experiments we have drawn the following conclusions:
We have not been able to find any guidelines on configuring the cosmosDB client for this truly serverless use case, so we would very much appreciate your comments on our findings. If we could somehow completely avoid the DB exception found below, that would be great…
To Reproduce Create Java function app on consumption plan. Deploy http triggered function which writes and modifies small dummy objects in cosmosDB (less than 1 kb). Use SDK 4.2.0, default direct connection configuration, strong consistency, synchronous client and make sure to cache the client so it is only created once. Set cosmosDB RU to 10.000 in order to eliminate RU's as a bottleneck. Write a test that puts some load on the function (eg. 100 requests pr second). Observe how application insights is flooded with exceptions of the type seen below.
Exception while executing function: Functions.JobPersister Result: Failure Exception: IllegalStateException: RntbdServiceEndpoint({"id":3,"isClosed":true,"concurrentRequests":0,"remoteAddress":"cdb-ms-prod-westeurope1-fd20.documents.azure.com:14318","channelPool":{"remoteAddress":"cdb-ms-prod-westeurope1-fd20.documents.azure.com:14318","isClosed":false,"configuration":{"maxChannels":130,"maxRequestsPerChannel":30,"idleConnectionTimeout":0,"readDelayLimit":65000000000,"writeDelayLimit":10000000000},"state":{"channelsAcquired":0,"channelsAvailable":0,"requestQueueLength":0}}}) is closed Stack: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.microsoft.azure.functions.worker.broker.JavaMethodInvokeInfo.invoke(JavaMethodInvokeInfo.java:22) at com.microsoft.azure.functions.worker.broker.JavaMethodExecutorImpl.execute(JavaMethodExecutorImpl.java:54)
Expected behavior It should be documented how to use cosmosDB with Azure functions consumption plan in a way that does not produce client exceptions.
Actual behavior We found that the default client configuration does not work reliably with Azure functions consumption plan and were unable to find documentation covering this usecase. We have found a client configuration which works reasonably well, but there are still a large number of exceptions polluting our logs. We would appreciate guidance on how to proceed or documentation covering our use case.