Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
735 stars 491 forks source link

Using DefaultAzureCredential results in HTTP Error 400. The size of the request headers is too long #3689

Closed vihanv01 closed 1 year ago

vihanv01 commented 1 year ago

Describe the bug When subsequently calling ReadNextAsync on a FeedIterator using a CosmosClient created using a DefaultAzureCredential, an exception is thrown stating that the request headers are too long.

This does not happen when using an authentication key. We want to move away from key-based access to Azure AD auth if possible.

Following the advice in this article, setting the ResponseContinuationTokenLimitInKb property on QueryRequestOptions has no effect and direct connection mode is not an option at the moment.

To Reproduce Here is a simple console app that reproduces the error:

using Azure.Identity;
using Microsoft.Azure.Cosmos;
using Newtonsoft.Json.Linq;

internal static class Program
{
    private static async Task Main()
    {
        var client = new CosmosClient(
            accountEndpoint:"https://account-endpoint.documents.azure.com:443/",
            tokenCredential: new DefaultAzureCredential(), 
            clientOptions: new CosmosClientOptions
            {
                ConnectionMode = ConnectionMode.Gateway,
                AllowBulkExecution = true,
            });

        client.ClientOptions.RequestTimeout = TimeSpan.FromMinutes(5);
        var container = client.GetContainer("database", "container");

        var query = new QueryDefinition("SELECT * FROM c WHERE c.id = c.PartitionKey");

        var iterator = container.GetItemQueryIterator<JObject>(query, requestOptions: new QueryRequestOptions()
        {
            ResponseContinuationTokenLimitInKb = int.MaxValue,
        });

        while (iterator.HasMoreResults)
        {
            var response = await iterator.ReadNextAsync();
            Console.WriteLine($"Document Count: {response.Count} | Request Charge: {response.RequestCharge}");
        }
    }
}

The error occurs when calling ReadNextAsync on line 30 after the first iteration of the loop.

When changing the parameter on line 11 to pass in an authentication key instead of the DefaultAzureCredential, the query is able to be continued as usual.

Expected behaviour A successful continuation response.

Actual behaviour A CosmosException is thrown due to a Bad Request - Request Too Long response. Full exception message under additional context.

Environment summary SDK Version: 3.31.2 OS Version: Windows Cosmos Instance Default Consistency: Session

Additional context

Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: d4df1ca3-837a-48c3-be43-cfaff624b282; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: d4df1ca3-837a-48c3-be43-cfaff624b282; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: d4df1ca3-837a-48c3-be43-cfaff624b282; Reason: (<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Bad Request</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Bad Request - Request Too Long</h2>
<hr><p>HTTP Error 400. The size of the request headers is too long.</p>
</BODY></HTML>

RequestUri: https://account-endpoint.documents.azure.com/dbs/database/colls/container/docs;
RequestMethod: POST;
Header: Authorization Length: 6598;
Header: x-ms-max-item-count Length: 4;
Header: x-ms-session-token Length: 18;
Header: x-ms-continuation Length: 10415;
Header: x-ms-documentdb-partitionkeyrangeid Length: 1;
Header: x-ms-documentdb-populatequerymetrics Length: 4;
Header: x-ms-documentdb-responsecontinuationtokenlimitinkb Length: 10;
Header: x-ms-cosmos-sdk-supportedcapabilities Length: 1;
Header: x-ms-cosmos-correlated-activityid Length: 36;
Header: x-ms-documentdb-query-iscontinuationexpected Length: 5;
Header: x-ms-documentdb-isquery Length: 4;
Header: x-ms-documentdb-query-enablecrosspartition Length: 4;
Header: x-ms-activity-id Length: 36;
Header: Cache-Control Length: 8;
Header: User-Agent Length: 89;
Header: x-ms-version Length: 10;
Header: Accept Length: 16;

ActivityId: d4df1ca3-837a-48c3-be43-cfaff624b282, Request URI: /dbs/database/colls/container/docs, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.19044 cosmos-netstandard-sdk/3.29.4);););
ealsur commented 1 year ago

Based on the error details, the 2 problematic headers are:

Header: Authorization Length: 6598;
Header: x-ms-continuation Length: 10415;
ealsur commented 1 year ago

@vihanv01 Authentication Token from AAD seems to be generated outside of this library, we'll take a look.

ealsur commented 1 year ago

@vihanv01 Which platform is this app running on? App Service? Azure VM?

You said setting ResponseContinuationTokenLimitInKb to like 5Kb has no effect?

vihanv01 commented 1 year ago

Hey @ealsur. Currently, we were testing it locally when we found the issue and haven't deployed it to pre-production as yet. We plan on hosting it in Azure Kubernetes Services with a Managed Identity as the pod identity.

Yeah, I tried making it int.MaxValue to no avail.

vihanv01 commented 1 year ago

I suspect it might be the combination of the two large header values since the first iteration passes but as soon as the continuation token is returned and used by the iterator to continue the execution, the error is thrown.

ealsur commented 1 year ago

Yeah, I tried making it int.MaxValue to no avail.

On the contrary, the value should be restrictive. int.MaxValue is basically applying no restriction on it.

Try setting it to a value that represents 5Kb.

ealsur commented 1 year ago

Circling back, the size of the Authorization header depends on the number of groups your AAD identity has. Normally it's 1.5-3.5Kb but it could be larger, depends on the identity groups/claims/roles.

Limiting the continuation token by setting ResponseContinuationTokenLimitInKb = 5 should help.

vihanv01 commented 1 year ago

Thanks for the additional information @ealsur. I misunderstood what the property represents. I followed your instructions to try limiting the value and it did indeed proceed successfully. I did however also switch it back to int.MaxValue and also removed setting the property altogether and it is now also continuing as expected. I didn't change anything else and we didn't make any changes to our Azure AD or Cosmos configuration.

Just for my own sanity, is there a way to hook onto the outgoing request so I can just view if the content length of the continuation token or authorization token changed at all?

vihanv01 commented 1 year ago

Kindly ignore my previous message because I noticed that I changed the query during my latest debugging session where I removed the where clause. After resetting I can consistently hit the error again and then by changing the ResponseContinuationTokenLimitInKb property to be sufficiently small (anything between 1 and 4 works).

Is there some guidance on what this value should be considering that we have limited control over the size of the authentication token? What are the risks / impact of making this value 1kb whenever we use the iterator?

ealsur commented 1 year ago

I honestly do not know. The size of the AAD Token is outside of what is controlled in this repo/library and obtained from AAD, I am not aware of any guidance from AAD regarding token size. What we do know is that HTTP headers have a maximum allowed size combined, and that the token's size depends on the groups/claims that the identity has.

@neildsh Do we have any recommendations on ResponseContinuationTokenLimitInKb usage? What is a safe number and on which cases can cause issues?

vihanv01 commented 1 year ago

Hey @ealsur, I would like to share some additional feedback. After running a few scenarios and increasing the limit sequentially, I noticed that the request charge seems to be higher when the limit is smaller. Specifically, I observed that for a limit of 1kb, the consumed RUs were about 6% higher than for a limit of 8kb, which was quite significant.

I am just a tiny bit surprised however that there is a point that the Cosmos Client cannot accept a continuation token back that it generates itself. This seems counterintuitive to me, and I was wondering if this is something that could be addressed? Thank you for your attention to this matter.

ealsur commented 1 year ago

@vihanv01 The problem is the AAD Token, not the Continuation. The AAD Token size can vary with factors outside of the SDK control, and the HTTP protocol has a limit on the volume of headers it supports.

In a normal non-AAD scenario, the Continuation produced always works in terms of header size because the Authorization header (controlled by the SDK) is of a certain size.

Because the AAD generated token varies depending on scopes/groups/etc, it will occupy towards that HTTP header limit.

Is the size of the AAD token (authorization token) header in your tests the one that will happen in production? Will production have the same number of scopes/groups/claims?

ealsur commented 1 year ago

Please reactivate once questions are answered