Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
723 stars 477 forks source link

AggregateException (instead of CosmosException) being thrown on GetFeedRanges when Gateway fails #4528

Open albertofori opened 3 weeks ago

albertofori commented 3 weeks ago

We are using CosmosDB SDK version 3.33.1. We currently see a Microsoft.Azure.Documents.DocumentClientException as the immediate inner exception under an aggregate exception that is thrown when there is a network issue.

This seems to be a reference to a V2 SDK exception which is not directly exposed via the current SDK. Is it intended to expose such an exception directly within V3 SDK AggregateException under these circumstances.

Please find below the stack trace (First inner exception can be found on the last line):

System.AggregateException: One or more errors occurred. (Channel is closed ActivityId: 29bda807-7374-4033-a295-dd3ba89246ab, RequestStartTime: 2024-05-31T17:07:41.1592990Z, RequestEndTime: 2024-05-31T17:07:47.5512837Z, Number of regions attempted:1 {"systemHistory":[{"dateUtc":"2024-05-31T17:06:53.5338902Z","cpu":0.452,"memory":663574236.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0278,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:03.5440037Z","cpu":0.232,"memory":663573756.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.1174,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:13.5537779Z","cpu":0.124,"memory":663597240.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0793,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:23.5635773Z","cpu":0.295,"memory":663585964.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.2314,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:33.5737057Z","cpu":0.129,"memory":663588336.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0326,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:43.5834609Z","cpu":0.176,"memory":663585136.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.246,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401}]} RequestStart: 2024-05-31T17:07:41.1595176Z; ResponseTime: 2024-05-31T17:07:47.5512837Z; StoreResult: StorePhysicalAddress: rntbd://10.0.1.17:11000/apps/3036edb8-a5b7-4779-89ab-9a0eb0a2340f/services/9c308ccc-4819-4bac-ad9a-8078f1783b80/partitions/a97a7a26-06e5-4e9d-92ee-489d1a774bc2/replicas/133574518272615524s, LSN: -1, GlobalCommittedLsn: -1, PartitionKeyRangeId: , IsValid: False, StatusCode: 503, SubStatusCode: 20006, RequestCharge: 0, ItemLSN: -1, SessionToken: , UsingLocalLSN: False, TransportException: A client transport error occurred: The connection failed. (Time: 2024-05-31T17:07:47.5484179Z, activity ID: 29bda807-7374-4033-a295-dd3ba89246ab, error code: ConnectionBroken [0x0012], base error: socket error TimedOut [0x0000274C], URI: rntbd://10.0.1.17:11000/apps/3036edb8-a5b7-4779-89ab-9a0eb0a2340f/services/9c308ccc-4819-4bac-ad9a-8078f1783b80/partitions/a97a7a26-06e5-4e9d-92ee-489d1a774bc2/replicas/133574518272615524s, connection: 10.0.1.8:47632 -> 10.0.1.17:11000, payload sent: True), BELatencyMs: , ActivityId: 29bda807-7374-4033-a295-dd3ba89246ab, RetryAfterInMs: , ReplicaHealthStatuses: [(port: 11300 | status: Unknown | lkt: 5/31/2024 5:07:41 PM),(port: 11000 | status: Unknown | lkt: 5/31/2024 5:07:41 PM),(port: 11300 | status: Unknown | lkt: 5/31/2024 5:07:41 PM),(port: 11000 | status: Unknown | lkt: 5/31/2024 5:07:41 PM)], TransportRequestTimeline: {"requestTimeline":[{"event": "Created", "startTimeUtc": "2024-05-31T17:07:41.1594362Z", "durationInMs": 0.0166},{"event": "ChannelAcquisitionStarted", "startTimeUtc": "2024-05-31T17:07:41.1594528Z", "durationInMs": 0.0086},{"event": "Pipelined", "startTimeUtc": "2024-05-31T17:07:41.1594614Z", "durationInMs": 0.0422},{"event": "Transit Time", "startTimeUtc": "2024-05-31T17:07:41.1595036Z", "durationInMs": 6389.3898},{"event": "Failed", "startTimeUtc": "2024-05-31T17:07:47.5488934Z", "durationInMs": 0}],"serviceEndpointStats":{"inflightRequests":8,"openConnections":1},"connectionStats":{"waitforConnectionInit":"False","callsPendingReceive":7,"lastSendAttempt":"2024-05-31T17:07:40.7640721Z","lastSend":"2024-05-31T17:07:40.7640802Z","lastReceive":"2024-05-31T17:07:26.7535939Z"},"requestSizeInBytes":725,"requestBodySizeInBytes":275}; ResourceType: DatabaseAccount, OperationType: MetadataCheckAccess , Microsoft.Azure.Documents.Common/2.14.0, Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, Linux/2.0 cosmos-netstandard-sdk/3.33.1) ---> Microsoft.Azure.Documents.DocumentClientException: Channel is closed ActivityId: ....

ealsur commented 3 weeks ago

Can you please attach the full exception? Normally TransportExceptions materialize as a public CosmosException with 503 as status code.

DocumentClientException is still there, but internal, that is expected. The key part is understanding what was the upper-most type, which should be CosmosException (regardless of the InnerException property value).

albertofori commented 3 weeks ago

@ealsur Thanks for getting back to me on this. Below is the full exception, the upper-most type appears to be an AggregateException which is very generic. With inner exceptions that are internal, handling this exception becomes less straightforward than it would typically be.

System.AggregateException: One or more errors occurred. (Channel is closed ActivityId: 29bda807-7374-4033-a295-dd3ba89246ab, RequestStartTime: 2024-05-31T17:07:41.1592990Z, RequestEndTime: 2024-05-31T17:07:47.5512837Z, Number of regions attempted:1 {"systemHistory":[{"dateUtc":"2024-05-31T17:06:53.5338902Z","cpu":0.452,"memory":663574236.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0278,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:03.5440037Z","cpu":0.232,"memory":663573756.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.1174,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:13.5537779Z","cpu":0.124,"memory":663597240.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0793,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:23.5635773Z","cpu":0.295,"memory":663585964.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.2314,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:33.5737057Z","cpu":0.129,"memory":663588336.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0326,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:43.5834609Z","cpu":0.176,"memory":663585136.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.246,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401}]} RequestStart: 2024-05-31T17:07:41.1595176Z; ResponseTime: 2024-05-31T17:07:47.5512837Z; StoreResult: StorePhysicalAddress: rntbd://10.0.1.17:11000/apps/3036edb8-a5b7-4779-89ab-9a0eb0a2340f/services/9c308ccc-4819-4bac-ad9a-8078f1783b80/partitions/a97a7a26-06e5-4e9d-92ee-489d1a774bc2/replicas/133574518272615524s, LSN: -1, GlobalCommittedLsn: -1, PartitionKeyRangeId: , IsValid: False, StatusCode: 503, SubStatusCode: 20006, RequestCharge: 0, ItemLSN: -1, SessionToken: , UsingLocalLSN: False, TransportException: A client transport error occurred: The connection failed. (Time: 2024-05-31T17:07:47.5484179Z, activity ID: 29bda807-7374-4033-a295-dd3ba89246ab, error code: ConnectionBroken [0x0012], base error: socket error TimedOut [0x0000274C], URI: rntbd://10.0.1.17:11000/apps/3036edb8-a5b7-4779-89ab-9a0eb0a2340f/services/9c308ccc-4819-4bac-ad9a-8078f1783b80/partitions/a97a7a26-06e5-4e9d-92ee-489d1a774bc2/replicas/133574518272615524s, connection: 10.0.1.8:47632 -> 10.0.1.17:11000, payload sent: True), BELatencyMs: , ActivityId: 29bda807-7374-4033-a295-dd3ba89246ab, RetryAfterInMs: , ReplicaHealthStatuses: [(port: 11300 | status: Unknown | lkt: 5/31/2024 5:07:41 PM),(port: 11000 | status: Unknown | lkt: 5/31/2024 5:07:41 PM),(port: 11300 | status: Unknown | lkt: 5/31/2024 5:07:41 PM),(port: 11000 | status: Unknown | lkt: 5/31/2024 5:07:41 PM)], TransportRequestTimeline: {"requestTimeline":[{"event": "Created", "startTimeUtc": "2024-05-31T17:07:41.1594362Z", "durationInMs": 0.0166},{"event": "ChannelAcquisitionStarted", "startTimeUtc": "2024-05-31T17:07:41.1594528Z", "durationInMs": 0.0086},{"event": "Pipelined", "startTimeUtc": "2024-05-31T17:07:41.1594614Z", "durationInMs": 0.0422},{"event": "Transit Time", "startTimeUtc": "2024-05-31T17:07:41.1595036Z", "durationInMs": 6389.3898},{"event": "Failed", "startTimeUtc": "2024-05-31T17:07:47.5488934Z", "durationInMs": 0}],"serviceEndpointStats":{"inflightRequests":8,"openConnections":1},"connectionStats":{"waitforConnectionInit":"False","callsPendingReceive":7,"lastSendAttempt":"2024-05-31T17:07:40.7640721Z","lastSend":"2024-05-31T17:07:40.7640802Z","lastReceive":"2024-05-31T17:07:26.7535939Z"},"requestSizeInBytes":725,"requestBodySizeInBytes":275}; ResourceType: DatabaseAccount, OperationType: MetadataCheckAccess , Microsoft.Azure.Documents.Common/2.14.0, Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, Linux/2.0 cosmos-netstandard-sdk/3.33.1) ---> Microsoft.Azure.Documents.DocumentClientException: Channel is closed ActivityId: 29bda807-7374-4033-a295-dd3ba89246ab, RequestStartTime: 2024-05-31T17:07:41.1592990Z, RequestEndTime: 2024-05-31T17:07:47.5512837Z, Number of regions attempted:1 {"systemHistory":[{"dateUtc":"2024-05-31T17:06:53.5338902Z","cpu":0.452,"memory":663574236.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0278,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:03.5440037Z","cpu":0.232,"memory":663573756.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.1174,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:13.5537779Z","cpu":0.124,"memory":663597240.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0793,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:23.5635773Z","cpu":0.295,"memory":663585964.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.2314,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:33.5737057Z","cpu":0.129,"memory":663588336.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0326,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401},{"dateUtc":"2024-05-31T17:07:43.5834609Z","cpu":0.176,"memory":663585136.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.246,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":401}]} RequestStart: 2024-05-31T17:07:41.1595176Z; ResponseTime: 2024-05-31T17:07:47.5512837Z; StoreResult: StorePhysicalAddress: rntbd://10.0.1.17:11000/apps/3036edb8-a5b7-4779-89ab-9a0eb0a2340f/services/9c308ccc-4819-4bac-ad9a-8078f1783b80/partitions/a97a7a26-06e5-4e9d-92ee-489d1a774bc2/replicas/133574518272615524s, LSN: -1, GlobalCommittedLsn: -1, PartitionKeyRangeId: , IsValid: False, StatusCode: 503, SubStatusCode: 20006, RequestCharge: 0, ItemLSN: -1, SessionToken: , UsingLocalLSN: False, TransportException: A client transport error occurred: The connection failed. (Time: 2024-05-31T17:07:47.5484179Z, activity ID: 29bda807-7374-4033-a295-dd3ba89246ab, error code: ConnectionBroken [0x0012], base error: socket error TimedOut [0x0000274C], URI: rntbd://10.0.1.17:11000/apps/3036edb8-a5b7-4779-89ab-9a0eb0a2340f/services/9c308ccc-4819-4bac-ad9a-8078f1783b80/partitions/a97a7a26-06e5-4e9d-92ee-489d1a774bc2/replicas/133574518272615524s, connection: 10.0.1.8:47632 -> 10.0.1.17:11000, payload sent: True), BELatencyMs: , ActivityId: 29bda807-7374-4033-a295-dd3ba89246ab, RetryAfterInMs: , ReplicaHealthStatuses: [(port: 11300 | status: Unknown | lkt: 5/31/2024 5:07:41 PM),(port: 11000 | status: Unknown | lkt: 5/31/2024 5:07:41 PM),(port: 11300 | status: Unknown | lkt: 5/31/2024 5:07:41 PM),(port: 11000 | status: Unknown | lkt: 5/31/2024 5:07:41 PM)], TransportRequestTimeline: {"requestTimeline":[{"event": "Created", "startTimeUtc": "2024-05-31T17:07:41.1594362Z", "durationInMs": 0.0166},{"event": "ChannelAcquisitionStarted", "startTimeUtc": "2024-05-31T17:07:41.1594528Z", "durationInMs": 0.0086},{"event": "Pipelined", "startTimeUtc": "2024-05-31T17:07:41.1594614Z", "durationInMs": 0.0422},{"event": "Transit Time", "startTimeUtc": "2024-05-31T17:07:41.1595036Z", "durationInMs": 6389.3898},{"event": "Failed", "startTimeUtc": "2024-05-31T17:07:47.5488934Z", "durationInMs": 0}],"serviceEndpointStats":{"inflightRequests":8,"openConnections":1},"connectionStats":{"waitforConnectionInit":"False","callsPendingReceive":7,"lastSendAttempt":"2024-05-31T17:07:40.7640721Z","lastSend":"2024-05-31T17:07:40.7640802Z","lastReceive":"2024-05-31T17:07:26.7535939Z"},"requestSizeInBytes":725,"requestBodySizeInBytes":275}; ResourceType: DatabaseAccount, OperationType: MetadataCheckAccess , Microsoft.Azure.Documents.Common/2.14.0, Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, Linux/2.0 cosmos-netstandard-sdk/3.33.1 at Microsoft.Azure.Cosmos.GatewayStoreClient.ParseResponseAsync(HttpResponseMessage responseMessage, JsonSerializerSettings serializerSettings, DocumentServiceRequest request) at Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeAsync(DocumentServiceRequest request, ResourceType resourceType, Uri physicalAddress, CancellationToken cancellationToken) at Microsoft.Azure.Cosmos.GatewayStoreModel.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken) at Microsoft.Azure.Cosmos.GatewayStoreModel.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken) at Microsoft.Azure.Cosmos.Routing.PartitionKeyRangeCache.ExecutePartitionKeyRangeReadChangeFeedAsync(String collectionRid, INameValueCollection headers, ITrace trace, IClientSideRequestStatistics clientSideRequestStatistics, IDocumentClientRetryPolicy retryPolicy) at Microsoft.Azure.Documents.BackoffRetryUtility1.ExecuteRetryAsync[TParam,TPolicy](Func1 callbackMethod, Func3 callbackMethodWithParam, Func2 callbackMethodWithPolicy, TParam param, IRetryPolicy retryPolicy, IRetryPolicy1 retryPolicyWithArg, Func1 inBackoffAlternateCallbackMethod, Func2 inBackoffAlternateCallbackMethodWithPolicy, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action1 preRetryCallback) at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException) at Microsoft.Azure.Documents.BackoffRetryUtility1.ExecuteRetryAsync[TParam,TPolicy](Func1 callbackMethod, Func3 callbackMethodWithParam, Func2 callbackMethodWithPolicy, TParam param, IRetryPolicy retryPolicy, IRetryPolicy1 retryPolicyWithArg, Func1 inBackoffAlternateCallbackMethod, Func2 inBackoffAlternateCallbackMethodWithPolicy, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action1 preRetryCallback) at Microsoft.Azure.Documents.BackoffRetryUtility1.ExecuteRetryAsync[TParam,TPolicy](Func1 callbackMethod, Func3 callbackMethodWithParam, Func2 callbackMethodWithPolicy, TParam param, IRetryPolicy retryPolicy, IRetryPolicy1 retryPolicyWithArg, Func1 inBackoffAlternateCallbackMethod, Func2 inBackoffAlternateCallbackMethodWithPolicy, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action1 preRetryCallback) at Microsoft.Azure.Cosmos.Routing.PartitionKeyRangeCache.GetRoutingMapForCollectionAsync(String collectionRid, CollectionRoutingMap previousRoutingMap, ITrace trace, IClientSideRequestStatistics clientSideRequestStatistics) at Microsoft.Azure.Cosmos.AsyncCacheNonBlocking2.AsyncLazyWithRefreshTask1.CreateAndWaitForBackgroundRefreshTaskAsync(Func2 createRefreshTask) at Microsoft.Azure.Cosmos.AsyncCacheNonBlocking2.UpdateCacheAndGetValueFromBackgroundTaskAsync(TKey key, AsyncLazyWithRefreshTask1 initialValue, Func2 callbackDelegate, String operationName) at Microsoft.Azure.Cosmos.AsyncCacheNonBlocking2.GetAsync(TKey key, Func2 singleValueInitFunc, Func2 forceRefresh) at Microsoft.Azure.Cosmos.Routing.PartitionKeyRangeCache.TryLookupAsync(String collectionRid, CollectionRoutingMap previousValue, DocumentServiceRequest request, ITrace trace) at Microsoft.Azure.Cosmos.Routing.PartitionKeyRangeCache.TryGetOverlappingRangesAsync(String collectionRid, Range1 range, ITrace trace, Boolean forceRefresh) at Microsoft.Azure.Cosmos.ContainerCore.GetFeedRangesAsync(ITrace trace, CancellationToken cancellationToken) at Microsoft.Azure.Cosmos.ClientContextCore.RunWithDiagnosticsHelperAsync[TResult](String containerName, String databaseName, OperationType operationType, ITrace trace, Func2 task, Func2 openTelemetry, String operationName, RequestOptions requestOptions) at Microsoft.Azure.Cosmos.ClientContextCore.OperationHelperWithRootTraceAsync[TResult](String operationName, String containerName, String databaseName, OperationType operationType, RequestOptions requestOptions, Func2 task, Func2 openTelemetry, TraceComponent traceComponent, TraceLevel traceLevel) --- End of inner exception stack trace ---

ealsur commented 3 weeks ago

@albertofori This seems a failure on the service.

Microsoft.Azure.Cosmos.GatewayStoreClient.ParseResponseAsync(HttpResponseMessage responseMessage, JsonSerializerSettings serializerSettings, DocumentServiceRequest request)

This means that there was a response from the Cosmos DB Gateway endpoint. The content of the response from the Gateway service is the one with the AggregateException. The body of the Gateway response is the being printed here, these TCP errors are not happening on the client. Sounds like the Gateway service is being overly verbose in including the failure.

The response was a ServiceUnavailable error (503)

ealsur commented 3 weeks ago

Using the details on this error, I can see the same exception details on the service logs.

From the SDK side, this is a Service Unavailable and should be treated as such: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/conceptual-resilient-sdk-applications#timeouts-and-connectivity-related-failures-http-408503

But there are no SDK changes that can prevent the service from hitting this error or producing the body containing these DocumentClientExceptions in the body content.

Looking at the service logs, there also appears to be only 2 failures in a 24h period.

ealsur commented 3 weeks ago

If these are happening more frequently, please file a support ticket, they seem to be transient failures on the service but the volume does not seem to be affecting SLA

ealsur commented 3 weeks ago

Accidentally closed

albertofori commented 3 weeks ago

@ealsur Thanks! I guess my question is when do we expect a CosmosException to be thrown by the SDK? I would expect that we have the Gateway's exception in higher level exception like CosmosException.

ealsur commented 3 weeks ago

I would also expect the outcome is a CosmosException and not AggregateException for certain and that is the reason I kept this open.

It seems we do account for similar cases: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/18a677ace9998450f22598efe43cecb20dea079e/Microsoft.Azure.Cosmos/src/Handler/TransportHandler.cs#L69

But in this case, you are performing a GetFeedRanges call, which is purely a metadata operation. The gap might be that in this case, there is no handling of these potential cases as the operation does not flow through the Handler pipeline.

Reference: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/18a677ace9998450f22598efe43cecb20dea079e/Microsoft.Azure.Cosmos/src/Resource/Container/ContainerCore.cs#L273

In this case, it would be ideal to avoid AggregateException on the AsyncNonBlockingCache or do the conversion on GetFeedRanges. I'll tag this issue appropiately to signal it needs addressing.

ealsur commented 3 weeks ago

The "reference" however (what is mentioned in the title) cannot be removed, as it's part of the content of the Gateway response and internal.

albertofori commented 3 weeks ago

I would also expect the outcome is a CosmosException and not AggregateException for certain and that is the reason I kept this open.

It seems we do account for similar cases:

https://github.com/Azure/azure-cosmos-dotnet-v3/blob/18a677ace9998450f22598efe43cecb20dea079e/Microsoft.Azure.Cosmos/src/Handler/TransportHandler.cs#L69

But in this case, you are performing a GetFeedRanges call, which is purely a metadata operation. The gap might be that in this case, there is no handling of these potential cases as the operation does not flow through the Handler pipeline.

Reference:

https://github.com/Azure/azure-cosmos-dotnet-v3/blob/18a677ace9998450f22598efe43cecb20dea079e/Microsoft.Azure.Cosmos/src/Resource/Container/ContainerCore.cs#L273

In this case, it would be ideal to avoid AggregateException on the AsyncNonBlockingCache or do the conversion on GetFeedRanges. I'll tag this issue appropiately to signal it needs addressing.

Sounds good! Getting this as a CosmosException would then make handling of the error more consistent with other errors via the StatusCode property without having to dive into the InnerException Thanks a lot @ealsur!

albertofori commented 3 weeks ago

The "reference" however (what is mentioned in the title) cannot be removed, as it's part of the content of the Gateway response and internal.

Understood, and I agree with this as it provides more information on the underlying cause.