Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.25k stars 1.93k forks source link

[BUG] Cosmos hangs forever with CosmosEndToEndOperationLatencyPolicyConfig set #40786

Open lnist opened 4 days ago

lnist commented 4 days ago

Describe the bug Certain operations cause the Cosmos SDK to hang forever and certain operations do not respect the timeout set by CosmosEndToEndOperationLatencyPolicyConfig.

It seems the hangs occur for operations that span partitions.

To Reproduce See this example repository and test: https://github.com/lnist/cosmos-sdk-hang/blob/main/src/test/java/cosmosTimeouts.java

In the test you need to fill in the connection string and master key for cosmos.

The test utilizes WireMock to simulate a delay in accessing the cosmos backend. For this a self-signed certificate is used, since the Cosmos SDK insists on using HTTPS.

If you execute the tests then they are all expected to fail due to timeout from the Cosmos SDK. That does not happen.

The readAllContainers and properties tests both return the desired data, but it takes longer than the configured timeout of 1 second. They should fail instead.

The readNonDefaultPartitionKey, count, readAll, and writeBulk all respect the timeout of 1 second if the DELAY parameter is set to 2_000, but they hang forever (until the test timeout of 1 minutes) if the DELAY parameter is set to 10_000.

Note: The code includes a couple of configurations that I think are redundant, but they were used during extensive testing, so I did not want to change them. A quick test without them seems to indicate the issues are present with default parameters (except of course for the CosmosEndToEndOperationLatencyPolicyConfig)

Code Snippet Add the code snippet that causes the issue.

Expected behavior The API uses the configured timeout.

Setup (please complete the following information):

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

github-actions[bot] commented 4 days ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @kushagraThapar @pjohari-ms @TheovanKraay.

kushagraThapar commented 4 days ago

@tvaron3 please take a look at this, thanks!