This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.25k
stars
1.93k
forks
source link
[BUG] Cosmos hangs forever with CosmosEndToEndOperationLatencyPolicyConfig set #40786
Describe the bug
Certain operations cause the Cosmos SDK to hang forever and certain operations do not respect the timeout set by CosmosEndToEndOperationLatencyPolicyConfig.
It seems the hangs occur for operations that span partitions.
In the test you need to fill in the connection string and master key for cosmos.
The test utilizes WireMock to simulate a delay in accessing the cosmos backend. For this a self-signed certificate is used, since the Cosmos SDK insists on using HTTPS.
If you execute the tests then they are all expected to fail due to timeout from the Cosmos SDK. That does not happen.
The readAllContainers and properties tests both return the desired data, but it takes longer than the configured timeout of 1 second. They should fail instead.
The readNonDefaultPartitionKey, count, readAll, and writeBulk all respect the timeout of 1 second if the DELAY parameter is set to 2_000, but they hang forever (until the test timeout of 1 minutes) if the DELAY parameter is set to 10_000.
Note: The code includes a couple of configurations that I think are redundant, but they were used during extensive testing, so I did not want to change them. A quick test without them seems to indicate the issues are present with default parameters (except of course for the CosmosEndToEndOperationLatencyPolicyConfig)
Expected behavior
The API uses the configured timeout.
Setup (please complete the following information):
OS: Windows 11
IDE: IntelliJ
Library/Libraries: com.azure:azure-cosmos:4.61.1
Java version: 21
App Server/Environment: jupiter test runner
Frameworks: N/A
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
Describe the bug Certain operations cause the Cosmos SDK to hang forever and certain operations do not respect the timeout set by CosmosEndToEndOperationLatencyPolicyConfig.
It seems the hangs occur for operations that span partitions.
To Reproduce See this example repository and test: https://github.com/lnist/cosmos-sdk-hang/blob/main/src/test/java/cosmosTimeouts.java
In the test you need to fill in the connection string and master key for cosmos.
The test utilizes WireMock to simulate a delay in accessing the cosmos backend. For this a self-signed certificate is used, since the Cosmos SDK insists on using HTTPS.
If you execute the tests then they are all expected to fail due to timeout from the Cosmos SDK. That does not happen.
The
readAllContainers
andproperties
tests both return the desired data, but it takes longer than the configured timeout of 1 second. They should fail instead.The
readNonDefaultPartitionKey
,count
,readAll
, andwriteBulk
all respect the timeout of 1 second if the DELAY parameter is set to 2_000, but they hang forever (until the test timeout of 1 minutes) if the DELAY parameter is set to 10_000.Note: The code includes a couple of configurations that I think are redundant, but they were used during extensive testing, so I did not want to change them. A quick test without them seems to indicate the issues are present with default parameters (except of course for the CosmosEndToEndOperationLatencyPolicyConfig)
Code Snippet Add the code snippet that causes the issue.
Expected behavior The API uses the configured timeout.
Setup (please complete the following information):
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report