Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
741 stars 494 forks source link

CosmosException.ResponseBody doesn't actually contain the response body #1910

Closed thomaslevesque closed 2 years ago

thomaslevesque commented 4 years ago

Describe the bug CosmosException.ResponseBody contains the details of an .NET exception, even though the name clearly implies that it should be, well, the response body... Cosmos DB error responses are already a mess; why does the SDK have to make things even worse? The interesting part (i.e. the JSON with the actual error details) is lost in the middle of a ton of junk, making it unnecessarily hard to extract.

To Reproduce Run a query with an error, e.g. with reference to an unknown identifier.

Expected behavior CosmosException.ResponseBody should contain the actual response body returned by Cosmos DB, i.e. something like this:

{"errors":[{"severity":"Error","location":{"start":36,"end":48},"code":"SC2001","message":"Identifier 'foo' could not be resolved."}]}

Actual behavior CosmosException.ResponseBody returns something that looks like the result of calling ToString() on an exception:

Microsoft.Azure.Cosmos.Query.Core.Monads.ExceptionWithStackTraceException: TryCatch resulted in an exception. ---> Microsoft.Azure.Cosmos.Query.Core.Monads.ExceptionWithStackTraceException: TryCatch resulted in an exception. ---> Microsoft.Azure.Cosmos.Query.Core.Exceptions.ExpectedQueryPartitionProviderException: {"errors":[{"severity":"Error","location":{"start":36,"end":48},"code":"SC2001","message":"Identifier 'foo' could not be resolved."}]}
 ---> System.Runtime.InteropServices.COMException (0x800A0B00): 0x800A0B00
   --- End of inner exception stack trace ---
   --- End of inner exception stack trace ---
   at Microsoft.Azure.Cosmos.Query.Core.QueryPlan.QueryPartitionProvider.TryGetPartitionedQueryExecutionInfoInternal(SqlQuerySpec querySpec, PartitionKeyDefinition partitionKeyDefinition, Boolean requireFormattableOrderByQuery, Boolean isContinuationExpected, Boolean allowNonValueAggregateQuery, Boolean hasLogicalPartitionKey)
  (... long stack trace...)

   --- End of inner exception stack trace ---
   at Microsoft.Azure.Cosmos.Query.Core.QueryPlan.QueryPartitionProvider.TryGetPartitionedQueryExecutionInfo(SqlQuerySpec querySpec, PartitionKeyDefinition partitionKeyDefinition, Boolean requireFormattableOrderByQuery, Boolean isContinuationExpected, Boolean allowNonValueAggregateQuery, Boolean hasLogicalPartitionKey)
   (... long stack trace...)

This is utterly unusable and doesn't help anyone.

Environment summary SDK Version: 3.13.0 OS Version: Windows 10

Additional context N/A

ealsur commented 4 years ago

Do you see this behavior only on queries?

thomaslevesque commented 4 years ago

@ealsur I see a similar problem if I try to execute a stored procedure that contains a syntax error:

Message: {"Errors":["Encountered exception while compiling Javascript. Exception = SyntaxError: Expected ')'\r\nSource information: line: 6, column: 6, source line:\r\n    \t{},"]}
ActivityId: 55b24acc-0ccd-4a27-9774-39b0ceeb82c8, Request URI: /apps/9efe5cf7-bad6-4171-9853-10018182fcd9/services/02735912-6e10-4752-a8bc-770a76c24c21/partitions/3beaf1ed-a088-4dc5-bf76-95972c138574/replicas/132443567906713532p/, RequestStats: Please see CosmosDiagnostics, SDK: Windows/10.0.19041 cosmos-netstandard-sdk/3.13.2

It's not as messy as in the previous example, but it's still difficult to extract the actual error, and it's probably not the actual response body. Also, the JSON is completely different from the previous example, but I guess that problem is probably in Cosmos DB itself rather than in the SDK. It would be nice to have a consistent and usable structure for errors...

thomaslevesque commented 4 years ago

In fact, the actual response body returned by Cosmos DB in that case is this (formatted):

{
    "code": "BadRequest",
    "message": "Message: {\"Errors\":[\"Encountered exception while compiling Javascript. Exception = SyntaxError: Expected ')'\\r\\nSource information: line: 6, column: 6, source line:\\r\\n    \\t{},\"]}\r\nActivityId: 3f913c92-1ea8-49f4-a370-5e3432d756d5, Request URI: /apps/9efe5cf7-bad6-4171-9853-10018182fcd9/services/02735912-6e10-4752-a8bc-770a76c24c21/partitions/3beaf1ed-a088-4dc5-bf76-95972c138574/replicas/132443567906713532p/, RequestStats: \r\nRequestStartTime: 2020-10-07T20:12:28.5373009Z, RequestEndTime: 2020-10-07T20:12:28.5472967Z,  Number of regions attempted:1\r\nResponseTime: 2020-10-07T20:12:28.5472967Z, StoreResult: StorePhysicalAddress: rntbd://cdb-ms-prod-westeurope1-fd10.documents.azure.com:14054/apps/9efe5cf7-bad6-4171-9853-10018182fcd9/services/02735912-6e10-4752-a8bc-770a76c24c21/partitions/3beaf1ed-a088-4dc5-bf76-95972c138574/replicas/132443567906713532p/, LSN: 2134, GlobalCommittedLsn: 2134, PartitionKeyRangeId: 0, IsValid: True, StatusCode: 400, SubStatusCode: 65535, RequestCharge: 1.9, ItemLSN: -1, SessionToken: -1#2134, UsingLocalLSN: False, TransportException: null, ResourceType: StoredProcedure, OperationType: ExecuteJavaScript\r\n, SDK: Microsoft.Azure.Documents.Common/2.11.0"
}

So I guess in this case the SDK isn't responsible for the mess: it's actually the value of the response body's message property. There are a few differences, not sure what caused them. Might be because in my app, I'm connecting in Direct mode, whereas cosmos.azure.com probably uses gateway mode?

ealsur commented 4 years ago

Yeah, basically for that Sproc request, the backend body has that content (it's coming from the service).

For queries though, it does sound like the content is being generated locally, that's why I was wondering if the issue was local to query.

thomaslevesque commented 4 years ago

Yeah, basically for that Sproc request, the backend body has that content (it's coming from the service).

But not exactly that content. I would expect ResponseBody to contain the raw request body, otherwise I don't see the point of this property.

Anyway, that's an area where I think Cosmos DB has a lot of room for improvement, both in the backend and the SDK. It shouldn't be that hard to see what caused an error. Currently it's difficult to parse reliably, and even for a human reading the raw error, it's difficult to find the relevant information.

thomaslevesque commented 4 years ago

BTW, in case you're wondering why I need to be able to parse the error, it's because I'm working on a tool for querying and managing Cosmos databases. I'd like to be able to display more user-friendly error messages to the user.

ghost commented 2 years ago

Closing due to in-activity, pease feel free to re-open.

thomaslevesque commented 2 years ago

Closing due to in-activity, pease feel free to re-open.

Haven't worked with CosmosDB in a while, so I don't know if this issue has been fixed, but if not, it should be reopened, inactivity or not. (also, the bot made a typo: "pease" 😛)

mattgotteiner commented 2 years ago

I have hit this issue as well

.net v3 SDK version 3.25.0 Repro:

using Microsoft.Azure.Cosmos;
using Newtonsoft.Json.Linq;
var client = new CosmosClient(connectionString: "AccountEndpoint=https://<...>.documents.azure.com:443/;AccountKey=<...>;");
var dbId = "<...>";
var containerId = "<...>";
try
{
    using FeedIterator<JObject> query = client.GetDatabase(dbId).GetContainer(containerId).GetItemQueryIterator<JObject>("selectttt * from c");
    while (query.HasMoreResults)
    {
        FeedResponse<JObject> queryResponse = await query.ReadNextAsync().ConfigureAwait(false);
        foreach (JObject document in queryResponse.Resource)
        {
            Console.WriteLine(document.ToString(Newtonsoft.Json.Formatting.Indented));
        }
    }
}
catch (CosmosException e)
{
    Console.WriteLine(e.ResponseBody);
}

How can we parse the response body, or some other exception field to get the true syntax error? My output:

Microsoft.Azure.Cosmos.Query.Core.Monads.ExceptionWithStackTraceException: TryCatch resulted in an exception. ---> Microsoft.Azure.Cosmos.Query.Core.Monads.ExceptionWithStackTraceException: TryCatch resulted in an exception. ---> Microsoft.Azure.Cosmos.Query.Core.Exceptions.ExpectedQueryPartitionProviderException: {"errors":[{"severity":"Error","location":{"start":0,"end":9},"code":"SC1001","message":"Syntax error, incorrect syntax near 'selectttt'."}]}
 ---> System.Runtime.InteropServices.COMException (0x800A0B00): 0x800A0B00
   --- End of inner exception stack trace ---
   --- End of inner exception stack trace ---
   at Microsoft.Azure.Cosmos.Query.Core.QueryPlan.QueryPartitionProvider.TryGetPartitionedQueryExecutionInfoInternal(SqlQuerySpec querySpec, PartitionKeyDefinition partitionKeyDefinition, Boolean requireFormattableOrderByQuery, Boolean isContinuationExpected, Boolean allowNonValueAggregateQuery, Boolean hasLogicalPartitionKey, Boolean allowDCount)
   at 
...

there's much more in the stack trace, truncated for readability

ealsur commented 2 years ago

The issue seems to be related to: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Microsoft.Azure.Cosmos/src/Query/Core/QueryPlan/QueryPlanRetriever.cs#L77-L81

The innerException is not getting populated when creating the BadRequest exception on the query plan.