Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
741 stars 494 forks source link

[BUG]: Handling 403:1008 for address resolution calls. #4710

Open jeet1995 opened 1 month ago

jeet1995 commented 1 month ago

We are continuously addressing and improving the SDK, if possible, make sure the problem persist in the latest SDK version.

Describe the bug When a region has been removed and the SDK possibly reaches out to removed region and performs address resolution calls against the removed region could have the operation fail.

To Reproduce Use the chaos framework in the Gateway mode to inject 403:1008s for Address resolution requests [or] subject a workload to single-write multi-region account where the write region is failed over.

Expected behavior Ideally, address resolution in-lined with the document operation should be part of the document operation's ClientRetryPolicy and be retriable to other available regions.

kirankumarkolli commented 2 weeks ago

Related exceptions stck


2024-09-24T17:10:00.008184Z CosmosDbRequestEndWithClientFailure CosmosItemDataProvider.Query.PrivateLinkAssociation Response status code does not indicate success: Forbidden (403); Substatus: 1008; ActivityId: 123fc2f9-43ae-4fa7-bcd1-5b284e0e7354; Reason: ( RequestUri: https://xxxx.documents.azure.com/dbs/LxkPAA==/colls/LxkPAMPhU1A=/pkranges; RequestMethod: GET; Header: authorization Length: 86; Header: x-ms-date Length: 29; Header: x-ms-max-item-count Length: 2; Header: A-IM Length: 16; Header: x-ms-activity-id Length: 36; Header: Cache-Control Length: 8; Header: User-Agent Length: 94; Header: x-ms-version Length: 10; Header: x-ms-cosmos-sdk-supportedcapabilities Length: 1; Header: Accept Length: 16; ActivityId: 123fc2f9-43ae-4fa7-bcd1-5b284e0e7354, Request URI: /dbs/LxkPAA==/colls/LxkPAMPhU1A=/pkranges, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.20348 cosmos-netstandard-sdk/3.34.4);  at Microsoft.Azure.Cosmos.GatewayStoreClient.<ParseResponseAsync>d__9.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.GatewayStoreClient.<InvokeAsync>d__5.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.GatewayStoreModel.<ProcessMessageAsync>d__9.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at Microsoft.Azure.Cosmos.GatewayStoreModel.<ProcessMessageAsync>d__9.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Routing.PartitionKeyRangeCache.<ExecutePartitionKeyRangeReadChangeFeedAsync>d__12.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at Microsoft.Azure.Documents.BackoffRetryUtility`1.<ExecuteRetryAsync>d__6`2.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)
    at Microsoft.Azure.Documents.BackoffRetryUtility`1.<ExecuteRetryAsync>d__6`2.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at Microsoft.Azure.Documents.BackoffRetryUtility`1.<ExecuteRetryAsync>d__6`2.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at Microsoft.Azure.Cosmos.Routing.PartitionKeyRangeCache.<GetRoutingMapForCollectionAsync>d__11.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.AsyncCacheNonBlocking`2.<GetAsync>d__8.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Routing.PartitionKeyRangeCache.<TryLookupAsync>d__9.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Routing.PartitionKeyRangeCache.<TryGetOverlappingRangesAsync>d__7.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at Microsoft.Azure.Cosmos.IRoutingMapProviderExtensions.<TryGetOverlappingRangesAsync>d__3.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.CosmosQueryClientCore.<GetTargetPartitionKeyRangesAsync>d__14.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.CosmosQueryExecutionContextFactory.<GetTargetPartitionKeyRangesAsync>d__18.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.CosmosQueryExecutionContextFactory.<TryCreateFromPartitionedQueryExecutionInfoAsync>d__10.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.CosmosQueryExecutionContextFactory.<TryCreateCoreContextAsync>d__9.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Query.Core.AsyncLazy`1.<GetValueAsync>d__7.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Query.Core.Pipeline.LazyQueryPipelineStage.<MoveNextAsync>d__7.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Query.Core.Pipeline.NameCacheStaleRetryQueryPipelineStage.<MoveNextAsync>d__10.MoveNext() --- End of stack trace from previous location where exception was thrown ---
    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
    at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at Microsoft.Azure.Cosmos.Query.Core.Pipeline.CatchAllQueryPipelineStage.<MoveNextAsync>d__1.MoveNext(```
kirankumarkolli commented 2 weeks ago

It's a cold start scenario

kirankumarkolli commented 2 weeks ago

Another related

Microsoft.Azure.Cosmos.CosmosException : Response status code does not indicate success: Forbidden (403); Substatus: 1008; ActivityId: ba79d888-129d-4424-a108-5c92fdc2a1a4; Reason: (
RequestUri: https://XXX.documents.azure.com//addresses/?$resolveFor=dbs%2fXzQbAA%3d%3d%2fcolls%2fXzQbAInIqrI%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0;
RequestMethod: GET;
Header: Authorization Length: 1593;
Header: x-ms-date Length: 29;
Header: x-ms-force-refresh Length: 4;
Header: Cache-Control Length: 8;
Header: User-Agent Length: 79;
Header: x-ms-version Length: 10;
Header: x-ms-cosmos-sdk-supportedcapabilities Length: 1;
Header: Accept Length: 16;

ActivityId: ba79d888-129d-4424-a108-5c92fdc2a1a4, Request URI: //addresses/?$resolveFor=dbs%2fXzQbAA%3d%3d%2fcolls%2fXzQbAInIqrI%3d%2fdocs&$filter=protocol%20eq%20rntbd&$partitionKeyRangeIds=0, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.20348 cosmos-netstandard-sdk/3.32.0);
   at Microsoft.Azure.Cosmos.GatewayStoreClient.ParseResponseAsync(HttpResponseMessage responseMessage, JsonSerializerSettings serializerSettings, DocumentServiceRequest request)
   at Microsoft.Azure.Cosmos.Routing.GatewayAddressCache.GetServerAddressesViaGatewayAsync(DocumentServiceRequest request, String collectionRid, IEnumerable`1 partitionKeyRangeIds, Boolean forceRefresh)
   at Microsoft.Azure.Cosmos.Routing.GatewayAddressCache.GetAddressesForRangeIdAsync(DocumentServiceRequest request, PartitionAddressInformation cachedAddresses, String collectionRid, String partitionKeyRangeId, Boolean forceRefresh)
   at Microsoft.Azure.Cosmos.AsyncCacheNonBlocking`2.AsyncLazyWithRefreshTask`1.CreateAndWaitForBackgroundRefreshTaskAsync(Func`2 createRefreshTask)
   at Microsoft.Azure.Cosmos.AsyncCacheNonBlocking`2.UpdateCacheAndGetValueFromBackgroundTaskAsync(TKey key, AsyncLazyWithRefreshTask`1 initialValue, Func`2 callbackDelegate, String operationName)
   at Microsoft.Azure.Cosmos.AsyncCacheNonBlocking`2.GetAsync(TKey key, Func`2 singleValueInitFunc, Func`2 forceRefresh)
   at Microsoft.Azure.Cosmos.Routing.GatewayAddressCache.TryGetAddressesAsync(DocumentServiceRequest request, PartitionKeyRangeIdentity partitionKeyRangeIdentity, ServiceIdentity serviceIdentity, Boolean forceRefreshPartitionAddresses, CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.AddressResolver.TryResolveServerPartitionAsync(DocumentServiceRequest request, ContainerProperties collection, CollectionRoutingMap routingMap, Boolean collectionCacheIsUptodate, Boolean collectionRoutingMapCacheIsUptodate, Boolean forceRefreshPartitionAddresses, CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.AddressResolver.ResolveAddressesAndIdentityAsync(DocumentServiceRequest request, Boolean forceRefreshPartitionAddresses, CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.AddressResolver.ResolveAsync(DocumentServiceRequest request, Boolean forceRefreshPartitionAddresses, CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.Routing.GlobalAddressResolver.ResolveAsync(DocumentServiceRequest request, Boolean forceRefresh, CancellationToken cancellationToken)
   at Microsoft.Azure.Documents.AddressSelector.ResolveAddressesAsync(DocumentServiceRequest request, Boolean forceAddressRefresh)
   at Microsoft.Azure.Documents.ConsistencyWriter.WritePrivateAsync(DocumentServiceRequest request, TimeoutHelper timeout, Boolean forceRefresh)
   at Microsoft.Azure.Documents.BackoffRetryUtility`1.ExecuteRetryAsync[TParam,TPolicy](Func`1 callbackMethod, Func`3 callbackMethodWithParam, Func`2 callbackMethodWithPolicy, TParam param, IRetryPolicy retryPolicy, IRetryPolicy`1 retryPolicyWithArg, Func`1 inBackoffAlternateCallbackMethod, Func`2 inBackoffAlternateCallbackMethodWithPolicy, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action`1 preRetryCallback)
   at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)
   at Microsoft.Azure.Doc--TRUNCATED--