Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
741 stars 494 forks source link

Malformed LimitContinuationToken #3975

Closed LouisSkyline closed 1 year ago

LouisSkyline commented 1 year ago

Describe the bug When doing a query with a limit, reading the second page using the provided continuation token immediately after reading the first page always fails with a Malformed LimitContinuationToken response. Retrying the same request using the same continuation token or waiting a little bit longer before reading the second page always results in success.

Unexpected exception while reading. Response status code does not indicate success: BadRequest (400); Substatus: 20007; ActivityId: ; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 20007; ActivityId: ; Reason: (Malformed LimitContinuationToken: [{"token":"+RID:~QqAcAN25+rfPLQQAAAAAAA==#RT:1#TRC:100#ISV:2#IEO:65567#QCF:8#FPC:Ac8tBAAAAAAA+y4EAAAAAAA=","range":{"min":"","max":"FF"}}].);); Malformed LimitContinuationToken: [{"token":"+RID:~QqAcAN25+rfPLQQAAAAAAA==#RT:1#TRC:100#ISV:2#IEO:65567#QCF:8#FPC:Ac8tBAAAAAAA+y4EAAAAAAA=","range":{"min":"","max":"FF"}}].

To Reproduce Unable to reproduce the issue on a local development machine, but can be consistently reproduced in a container app in the same region as the Cosmos DB.

Can be reproduced by performing a query with a Limit and using the continuation token provided by the first page to get the second page. The second page needs to be fetched shortly after the first page.

Expected behavior The second page should return on the first try without a 400 bad request response

Actual behavior The request for the second page returns a 400 bad request

Environment summary Linux container in the same region as the cosmos account

Additional context Occurs on both serverless and dedicated cosmos storage accounts. The used version of SDK: 3.35.1

Microsoft.Azure.Cosmos.CosmosException handled 
   at Microsoft.Azure.Cosmos.ResponseMessage.EnsureSuccessStatusCode (Microsoft.Azure.Cosmos.Client, Version=3.35.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at Microsoft.Azure.Cosmos.QueryResponse`1.CreateResponse (Microsoft.Azure.Cosmos.Client, Version=3.35.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at Microsoft.Azure.Cosmos.CosmosResponseFactoryCore.CreateQueryFeedResponseHelper (Microsoft.Azure.Cosmos.Client, Version=3.35.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at Microsoft.Azure.Cosmos.FeedIteratorCore`1+<ReadNextAsync>d__8.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.35.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Cosmos.ClientContextCore+<RunWithDiagnosticsHelperAsync>d__40`1.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.35.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Cosmos.ClientContextCore+<OperationHelperWithRootTraceAsync>d__30`1.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.35.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
Inner exception Microsoft.Azure.Cosmos.Query.Core.Exceptions.MalformedContinuationTokenException handled at Microsoft.Azure.Cosmos.ResponseMessage.EnsureSuccessStatusCode:
ealsur commented 1 year ago

@LouisSkyline Can you provide a repro code that we can use? Are you passing the continuation token "as is" or is the continuation token travelling somewhere (for example through a Web API response) and being sent back (for example through another Web API request)?

LouisSkyline commented 1 year ago

I'm unable to share any code at the moment.

We safe the token into a different cosmos container and read out that record for the next page read. But I would be surprised if that is the issue because retrying with the same token or delaying the read of the second page fixes the issues.

ealsur commented 1 year ago

@LouisSkyline Can you please confirm if the token you are reading from the container to use in the next page has the exact same value as the original token and no escape characters have been accidentally added?

If this reproes consistently, then it should be a matter of logging the token when obtained from the query and when obtained from the container and compare?

If you can share some code we can use to repro or share the actual values, it would help.

neildsh commented 1 year ago

I believe this is also being tracked by an internal ICM 405583234, which has more detail. Currently, it appears that there is a mismatch between continuation token and the query supplied as input to the api.

leminh98 commented 1 year ago

Closing this as the associated Icm is resolved