Azure / azure-cosmos-dotnet-v2

Contains samples and utilities relating to the Azure Cosmos DB .NET SDK
MIT License
577 stars 836 forks source link

Occasional RequestTimeout exceptions in CosmosDb Emulator for Upsert command #669

Open paveliak opened 5 years ago

paveliak commented 5 years ago

Occasionally we observe RequestTimeout exceptions for Upsert operations of TableAPI when using CosmosDB emulator. They go away upon immediate retry.

Microsoft.Azure.Storage.StorageException: Message: {"Errors":["The requested operation exceeded maximum alloted time."]}
ActivityId: 58f0b2a4-ffa1-4618-b9d0-0eb16438f1f5, Request URI: /apps/DocDbApp/services/DocDbServer24/partitions/a4cb4964-38c8-11e6-8106-8cdcd42c33be/replicas/1p/, RequestStats: 
RequestStartTime: 2019-01-27T06:42:15.0472055Z, Number of regions attempted: 1
, SDK: documentdb-dotnet-sdk/2.1.0 Host/64-bit MicrosoftWindowsNT/10.0.14393.0 ---> Microsoft.Azure.Documents.RequestTimeoutException: Message: {"Errors":["The requested operation exceeded maximum alloted time."]}
ActivityId: 58f0b2a4-ffa1-4618-b9d0-0eb16438f1f5, Request URI: /apps/DocDbApp/services/DocDbServer24/partitions/a4cb4964-38c8-11e6-8106-8cdcd42c33be/replicas/1p/, RequestStats: 
RequestStartTime: 2019-01-27T06:42:15.0472055Z, Number of regions attempted: 1
, SDK: documentdb-dotnet-sdk/2.1.0 Host/64-bit MicrosoftWindowsNT/10.0.14393.0
   at void Microsoft.Azure.Documents.TransportClient.ThrowIfFailed(string resourceAddress, StoreResponse storeResponse, Uri physicalAddress, Guid activityId)
   at async Task<StoreResponse> Microsoft.Azure.Documents.Rntbd.TransportClient.InvokeStoreAsync(Uri physicalAddress, ResourceOperation resourceOperation, DocumentServiceRequest request)
   at async Task<StoreResponse> Microsoft.Azure.Documents.ConsistencyWriter.WritePrivateAsync(DocumentServiceRequest request, TimeoutHelper timeout, bool forceRefresh)
   at async Task<StoreResponse> Microsoft.Azure.Documents.ConsistencyWriter.WriteAsync(DocumentServiceRequest entity, TimeoutHelper timeout, bool forceRefresh)
   at async Task<StoreResponse> Microsoft.Azure.Documents.ReplicatedResourceClient.InvokeAsync(DocumentServiceRequest request, TimeoutHelper timeout, bool isInRetry, bool forceRefresh)
   at void Microsoft.Azure.Documents.ReplicatedResourceClient+<>c__DisplayClass21_0+<<InvokeAsync>b__0>d.MoveNext()
   at void Microsoft.Azure.Documents.BackoffRetryUtility<T>+<>c__DisplayClass4_0<TPolicyArg1>+<<ExecuteAsync>b__1>d.MoveNext()
   at async Task<T> Microsoft.Azure.Documents.BackoffRetryUtility<T>.ExecuteRetryAsync(Func<Task<T>> callbackMethod, Func<Exception, CancellationToken, Task<ShouldRetryResult>> callShouldRetry, Func<Task<T>> inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action<Exception> preRetryCallback)
   at void Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)
   at async Task<T> Microsoft.Azure.Documents.BackoffRetryUtility<T>.ExecuteRetryAsync(Func<Task<T>> callbackMethod, Func<Exception, CancellationToken, Task<ShouldRetryResult>> callShouldRetry, Func<Task<T>> inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action<Exception> preRetryCallback)
   at async Task<T> Microsoft.Azure.Documents.BackoffRetryUtility<T>.ExecuteAsync<TPolicyArg1>(Func<TPolicyArg1, Task<T>> callbackMethod, IRetryPolicy<TPolicyArg1> retryPolicy, Func<TPolicyArg1, Task<T>> inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action<Exception> preRetryCallback)
   at async Task<StoreResponse> Microsoft.Azure.Documents.ReplicatedResourceClient.InvokeAsync(DocumentServiceRequest request, Func<DocumentServiceRequest, Task> prepareRequestAsyncDelegate, CancellationToken cancellationToken)
   at void Microsoft.Azure.Documents.StoreClient+<>c__DisplayClass17_0+<<ProcessMessageAsync>b__0>d.MoveNext()
   at async Task<DocumentServiceResponse> Microsoft.Azure.Documents.StoreClient.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken, IRetryPolicy retryPolicy, Func<DocumentServiceRequest, Task> prepareRequestAsyncDelegate)
   at async Task<DocumentServiceResponse> Microsoft.Azure.Documents.ServerStoreModel.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken)
   at async Task<DocumentServiceResponse> Microsoft.Azure.Documents.Client.DocumentClient.UpsertAsync(DocumentServiceRequest request, CancellationToken cancellationToken)
   at async Task<ResourceResponse<Document>> Microsoft.Azure.Documents.Client.DocumentClient.UpsertDocumentPrivateAsync(string documentCollectionLink, object document, RequestOptions options, bool disableAutomaticIdGeneration, IDocumentClientRetryPolicy retryPolicyInstance, CancellationToken cancellationToken)
   at void Microsoft.Azure.Documents.BackoffRetryUtility<T>+<>c__DisplayClass1_0+<<ExecuteAsync>b__0>d.MoveNext()
   at async Task<T> Microsoft.Azure.Documents.BackoffRetryUtility<T>.ExecuteRetryAsync(Func<Task<T>> callbackMethod, Func<Exception, CancellationToken, Task<ShouldRetryResult>> callShouldRetry, Func<Task<T>> inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action<Exception> preRetryCallback)
   at void Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)
   at async Task<T> Microsoft.Azure.Documents.BackoffRetryUtility<T>.ExecuteRetryAsync(Func<Task<T>> callbackMethod, Func<Exception, CancellationToken, Task<ShouldRetryResult>> callShouldRetry, Func<Task<T>> inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action<Exception> preRetryCallback)
   at async Task<T> Microsoft.Azure.Documents.BackoffRetryUtility<T>.ExecuteAsync(Func<Task<T>> callbackMethod, IRetryPolicy retryPolicy, CancellationToken cancellationToken, Action<Exception> preRetryCallback)
   at async Task<ResourceResponse<Document>> Microsoft.Azure.Documents.Client.DocumentClient.UpsertDocumentInlineAsync(string documentsFeedOrDatabaseLink, object document, RequestOptions options, bool disableAutomaticIdGeneration, CancellationToken cancellationToken)
   at async Task<TableResult> Microsoft.Azure.CosmosDB.Table.Extensions.TableExtensionOperationHelper.HandleUpsertAsync(TableOperation operation, CloudTableClient client, CloudTable table, TableRequestOptions options, OperationContext context) in d:/dbs/sh/csdb/0926_144214/cmd/2/Product/SDK/Table/.net/Lib/WindowsDesktop.Split/Table/Extensions/TableExtensionOperationHelper.cs:line 270
   at async Task<TResult> Microsoft.Azure.CosmosDB.Table.Extensions.TableExtensionOperationHelper.ExecuteOperationAsync<TResult>(TableOperation operation, CloudTableClient client, CloudTable table, TableRequestOptions requestOptions, OperationContext operationContext, CancellationToken cancellationToken) in d:/dbs/sh/csdb/0926_144214/cmd/2/Product/SDK/Table/.net/Lib/WindowsDesktop.Split/Table/Extensions/TableExtensionOperationHelper.cs:line 72
   --- End of inner exception stack trace ---
   at void Microsoft.Azure.CosmosDB.Table.Extensions.TableExtensionRetryPolicy.ThrowTimeoutIfElapsed(Nullable<TimeSpan> maxExecutionTime, DateTime startTime, OperationContext operationContext, StorageException previousException) in d:/dbs/sh/csdb/0926_144214/cmd/2/Product/SDK/Table/.net/Lib/WindowsDesktop.Split/Table/Extensions/TableExtensionRetryPolicy.cs:line 152
   at async Task<TResult> Microsoft.Azure.CosmosDB.Table.Extensions.TableExtensionRetryPolicy.ExecuteUnderRetryPolicy<TResult>(Func<Task<TResult>> executionMethod, CancellationToken cancellationToken, OperationContext operationContext, TableRequestOptions requestOptions) in d:/dbs/sh/csdb/0926_144214/cmd/2/Product/SDK/Table/.net/Lib/WindowsDesktop.Split/Table/Extensions/TableExtensionRetryPolicy.cs:line 124
   at TResult Microsoft.Azure.CosmosDB.Table.Extensions.TableExtensionExecutor.Execute<TResult, TOperation>(TOperation operation, CloudTableClient client, CloudTable table, TableRequestOptions requestOptions, OperationContext operationContext) in d:/dbs/sh/csdb/0926_144214/cmd/2/Product/SDK/Table/.net/Lib/WindowsDesktop.Split/Table/Extensions/TableExtensionExecutor.cs
   at TableResult Microsoft.Azure.CosmosDB.Table.TableOperation.Execute(CloudTableClient client, CloudTable table, TableRequestOptions requestOptions, OperationContext operationContext) in d:/dbs/sh/csdb/0926_144214/cmd/2/Product/SDK/Table/.net
jamesfoster commented 5 years ago

I'm getting a similar issue. Intermittent timeouts when using CreateDocumentAsync. We use the emulator for integration testing and out of a suite of 150 tests, 1 test will usually fail with the following error.

It's not always the same test. Getting a single failure is very common, getting 2 is rare but does happen.

After looking at cosmos after a test fails, the expected document exists. The operation was a success! it just happened to timeout? Strangely tho, looking at the RequestStartTime and RequestEndTime, the request took 0ms?

Let me know if there's anything I can do to help diagnose the issue.

Microsoft.Azure.Documents.DocumentClientException: Message: Request timed out.
ActivityId: e1e607c3-e50e-47bd-a8bd-4cbb04c0ee87, Request URI: /apps/DocDbApp/services/DocDbServer17/partitions/a4cb495d-38c8-11e6-8106-8cdcd42c33be/replicas/1p/, RequestStats: 
RequestStartTime: 2019-08-27T11:11:15.5802078Z, RequestEndTime: 2019-08-27T11:11:15.5802078Z, Number of regions attempted: 1
, SDK: Microsoft.Azure.Documents.Common/2.4.0.0, Windows/10.0.18362 documentdb-netcore-sdk/2.4.0
   at Microsoft.Azure.Documents.GatewayStoreClient.ParseResponseAsync(HttpResponseMessage responseMessage, JsonSerializerSettings serializerSettings, DocumentServiceRequest request)
   at Microsoft.Azure.Documents.GatewayStoreClient.InvokeAsync(DocumentServiceRequest request, ResourceType resourceType, Uri physicalAddress, CancellationToken cancellationToken)
   at Microsoft.Azure.Documents.GatewayStoreModel.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken)
   at Microsoft.Azure.Documents.Client.DocumentClient.ProcessRequestAsync(DocumentServiceRequest request, IDocumentClientRetryPolicy retryPolicyInstance, CancellationToken cancellationToken)
   at Microsoft.Azure.Documents.Client.DocumentClient.CreateDocumentPrivateAsync(String documentCollectionLink, Object document, RequestOptions options, Boolean disableAutomaticIdGeneration, IDocumentClientRetryPolicy retryPolicyInstance, CancellationToken cancellationToken)
   at Microsoft.Azure.Documents.BackoffRetryUtility`1.ExecuteRetryAsync(Func`1 callbackMethod, Func`3 callShouldRetry, Func`1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action`1 preRetryCallback)
   at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)
   at Microsoft.Azure.Documents.BackoffRetryUtility`1.ExecuteRetryAsync(Func`1 callbackMethod, Func`3 callShouldRetry, Func`1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action`1 preRetryCallback)
   at Microsoft.Azure.Documents.Client.DocumentClient.CreateDocumentInlineAsync(String documentsFeedOrDatabaseLink, Object document, RequestOptions options, Boolean disableAutomaticIdGeneration, CancellationToken cancellationToken)
DevZenFlow commented 5 years ago

We are also seeing odd timeouts on create document async calls but using azure hosted cosmos db.

Adelzu commented 4 years ago

I'm facing this with V3 SDK on Azure, not emulator. So we get Request Time, 408 and the item is actually created in the database! This is annoying to workaround as it conflicts with another bug, where if you insert multiple items with the same id across multiple partitions, reading the failed item quickly (using id and partitionKey) gets you the wrong item from a wrong partition.

We are using SDK V3.8.0 on Core 3.1

matt-mahdieh commented 4 years ago

I am experiencing the same issue using V3 SDK on Azure. Any progress on this issue?

tunminster commented 4 years ago

same here. 408 gateway request timeout with V3 SDK on Azure cosmosdb.

VladimirDomnich commented 4 years ago

Same here, we are using Mongo API and intermittently receive the following error:

Message: {"Errors":["The requested operation exceeded maximum alloted time. Learn more: https:\/\/aka.ms\/cosmosdb-tsg-service-request-timeout"]}
ActivityId: ced398c5-0000-0000-0000-000000000000,
RequestStats: 
RequestStartTime: 2020-10-22T00:41:18.0710333Z,
RequestEndTime: 2020-10-22T00:41:23.4809804Z,
Number of regions attempted:1
ResponseTime: 2020-10-22T00:41:23.4809804Z
akhileshSDM commented 3 years ago

We are getting same issue. Any update on how to fix this?

{ "Message": "Message: {\"Errors\":[\"The requested operation exceeded maximum alloted time. Learn more: https:\/\/aka.ms\/cosmosdb-tsg-service-request-timeout\"]}\nActivityId: ec8e38ff-986f-41b1-962e-6b8fd4b543d4, Request URI: /apps/ba6fe40a-0ffa-4bb0-ba22-08b36841e9b6/services/9c79c251-626d-42dd-b412-02e522e0034e/partitions/5832230e-6af0-4af1-91be-e534a99b3790/replicas/132438418298304872p//dbs/prod/colls/annualviewcollection_production/docs, RequestStats: \nRequestStartTime: 2020-12-14T20:08:10.4903186Z, RequestEndTime: 2020-12-14T20:08:16.8261747Z, Number of regions attempted:1\nResponseTime: 2020-12-14T20:08:16.8261747Z, StoreResult: StorePhysicalAddress: https://cdb-ms-prod-eastus2-fd5.documents.azure.com:14776/apps/ba6fe40a-0ffa-4bb0-ba22-08b36841e9b6/services/9c79c251-626d-42dd-b412-02e522e0034e/partitions/5832230e-6af0-4af1-91be-e534a99b3790/replicas/132438418298304872p//, LSN: -1, GlobalCommittedLsn: -1, PartitionKeyRangeId: , IsValid: False, StatusCode: 408, SubStatusCode: 0, RequestCharge: 0, ItemLSN: -1, SessionToken: , UsingLocalLSN: False, TransportException: null, ResourceType: Document, OperationType: Upsert\n, SDK: Linux/10 documentdb-netcore-sdk/2.10.0", "ErrorDate": "2020-12-14T20:08:16.8275029Z" },

}

mms- commented 3 years ago

V2 SDK also getting this a few times:

Microsoft.Azure.Documents.RequestTimeoutException: Message: Request timed out. More info: https://aka.ms/cosmosdb-tsg-request-timeout ActivityId: .., Request URI:... , RequestStats: RequestStartTime: 2021-03-01T07:05:01.9165737Z, RequestEndTime: 2021-03-01T07:05:01.9165737Z, Number of regions attempted:1 , SDK: documentdb-dotnet-sdk/2.13.1 Host/64-bit MicrosoftWindowsNT/6.2.9200.0 ---> Microsoft.Azure.Documents.TransportException: A client transport error occurred: Acquiring the send stream lock timed out. (Time: 2021-03-01T07:05:12.9181766Z, activity ID: 2dbb632e-a988-4328-86e8-13fe99d17eb0, error code: SendLockTimeout [0x000D], base error: HRESULT 0x80131500, URI: rntbd://cdb-ms-prod-westus1-fd36.documents.azure.com:14310/apps/.., connection: 10.218.0.4:57965 -> 40.112.241.34:14310, payload sent: True, CPU history: (2021-02-28T23:20:23.0498147Z 16.242), CPU count: 1) at Microsoft.Azure.Documents.Rntbd.Channel.d__13.MoveNext() --- End of stack trace

atomicleopard commented 3 years ago

Hi, I'm also getting a similar error using the Emulator. My observation is that this happens while I'm making concurrent requests, and you can see that it is reporting that its happening immediately (i.e. the start and end time are reported to be identical)

Promise result rejected Error: 
  Bulk request errored with: 
    Message: Request timed out. More info: https://aka.ms/cosmosdb-tsg-request-timeout
    ActivityId: 98777d16-d3c5-4866-8df6-01de5c07d3f2, 
    Request URI: /apps/DocDbApp/services/DocDbServer3/partitions/a4cb494f-38c8-11e6-8106-8cdcd42c33be/replicas/1p/, 
    RequestStats: 
      RequestStartTime: 2021-11-05T01:52:20.4639778Z, 
      RequestEndTime:   2021-11-05T01:52:20.4639778Z,  
      Number of regions attempted:1
, SDK: Microsoft.Azure.Documents.Common/2.14.0
antoinegaston commented 2 years ago

I'm not sure if we can speak of a fix but I experienced this error and it was due to the indexing policy. Removing indexing policy before upserting new data and re-setting it right after allowed me to get rid of this error.

aaronzhongg commented 2 years ago

I'm experiencing similar issues on cosmos v3 against the emulator, but only when running via Azure hosted pipeline agents and not locally. I found that increasing the partition count and container throughput reduced the frequency of the error (see: https://github.com/Azure/azure-cosmos-dotnet-v2/issues/817#issuecomment-927133201) but, not enough for me to consider using an azure hosted cosmos instance or try with a self-hosted pipeline agent

utillity commented 1 year ago

is there any progress on this? it's 2023 and the same issue still exists regularly during integration testing. I am getting this using the gremlin API

KT-JuliaDias commented 1 week ago

it's 2024 and the issue still exists