Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
736 stars 492 forks source link

Updating Cosmo db from 3.29 to 3.31.2+ causes Kestrel issues #4454

Closed waszakCeneo closed 4 months ago

waszakCeneo commented 5 months ago

Describe the bug Every time we want to update cosmosdb package from version 3.29 to 3.35+. For example to 3.39 we experience issues. The tricky part is that error always goes from methods Using AzureTable. Cosmos works fine, and the issue is always network related.

Small % of request fails not all of them, and retry succeeds always

image

To Reproduce For us to reproduce its just simple as changing from on production <PackageReference Include="Microsoft.Azure.Cosmos" Version="3.29.0" /> to <PackageReference Include="Microsoft.Azure.Cosmos" Version="3.35.2" /> (this we tried in july 2023) <PackageReference Include="Microsoft.Azure.Cosmos" Version="3.36.0" /> (this in december 2023) <PackageReference Include="Microsoft.Azure.Cosmos" Version="3.39.0" /> (this in april 2024) <PackageReference Include="Microsoft.Azure.Cosmos" Version="3.31.2" /> (this in 29-30 april 2024)

This will cause the issue. Whats its important we have heavy workload on our service. (Not to cosmosdb, but to azure table) On int environement I was never able to reproduce this error.

We even left this package for few days just to see if its not accident but it is always the case.

Expected behavior I expect that updating package won't cause Kestrel issues.

Actual behavior We see spikes in error in microservice. In unrelated methods. For example we have method that entires from AzureTable (it doesn't use Cosmosdb at all). And yet everytime we upgrade package it just starts throwing more.

So from my observation. New Package causes connectivity issues

Environment summary SDK Version: 3.39 OS Version Linux -AKS

Additional context

BadHttpRequestException occurred in application [Response.HasStarted: False]: Unexpected end of request content. at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1ContentLengthMessageBody.ReadAsyncInternal(CancellationToken cancellationToken) at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder1.StateMachineBox1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 destination, CancellationToken cancellationToken) at System.Text.Json.JsonSerializer.ReadFromStreamAsync(Stream utf8Json, ReadBufferState bufferState, CancellationToken cancellationToken) at System.Text.Json.JsonSerializer.ReadAllAsync[TValue](Stream utf8Json, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken) at Microsoft.AspNetCore.Mvc.Formatters.SystemTextJsonInputFormatter.ReadRequestBodyAsync(InputFormatterContext context, Encoding encoding) at Microsoft.AspNetCore.Mvc.Formatters.SystemTextJsonInputFormatter.ReadRequestBodyAsync(InputFormatterContext context, Encoding encoding) at Microsoft.AspNetCore.Mvc.ModelBinding.Binders.BodyModelBinder.BindModelAsync(ModelBindingContext bindingContext) at Microsoft.AspNetCore.Mvc.ModelBinding.ParameterBinder.BindModelAsync(ActionContext actionContext, IModelBinder modelBinder, IValueProvider valueProvider, ParameterDescriptor parameter, ModelMetadata metadata, Object value, Object container) at Microsoft.AspNetCore.Mvc.Controllers.ControllerBinderDelegateProvider.<>cDisplayClass0_0.<gBind|0>d.MoveNext() --- End of stack trace from previous location --- at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.gAwaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted) at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.gAwaited|26_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)

Our code is like this. We store comosclient as singletion .AddSingleton<ICosmosFactory, CosmosFactory>()

image

image

PS. We have no issues with cosmos db utilization is lowe, just we have issues with conectivity to our microservice after updating package in unrelated methods.

waszakCeneo commented 5 months ago

We experience this issue for over an year and its stopps us from updating the package Thats why If there where any breaking changes with this package since 3.29 then would be nice to know.

(I tried multiple times updating this package and always ended in the same issues)

ealsur commented 5 months ago

Can you provide some repro or some way for us to see the issue? The stack trace of the exception you mention has no relationship with the SDK.

It's happening on Microsoft.AspNetCore.Mvc.ModelBinding which is the step where a Controller reads the body of an incoming HTTP message to bind it to some type you are declaring in the controller method. And the error is that the HTTP content being sent to the controller cannot be deserialized into the expected type.

This is happening on the asp.net controller, not th SDK. Can you provide the SDK error if any?

waszakCeneo commented 5 months ago

As I said before it doesn't cause errors in Comosdb SDK. But updating just the package causes errors in asp.net controller with some requests failing. I am not sure its possible to reproduce this error easily outside of production. We tried for past year. The only certain thing is that is that updating this package causes spikes in errors.

It's happening on Microsoft.AspNetCore.Mvc.ModelBinding which is the step where a Controller reads the body of an incoming HTTP message to bind it to some type you are declaring in the controller method.

Its always

BadHttpRequestException occurred in application [Response.HasStarted: False]: Unexpected end of request content.

We get like milios of request to this method. So error is not big but its correlated just to changing cosmosdb version.

So like i was never able to reproduce this error. All we know its related to updating the package. We update the package keeped for week it was erroring request, we downgraded and its stopped.

I was thinking maybe some socket error or something else. Request stops after like 1ms. Successful request: false, Response time: 951.4 μs, Retry works. Just the error rate is higher than normal


I will think if we can share some minimal code but it will be more confusing. Because methods thats failiing uses just AzureTable it doesn't use CosmosDB

I had to like downgrade package one by one last year to pinpoint this package as issue. And multiple tries of updating this package later always cause the same issue.

ealsur commented 5 months ago

Looking at 3.29's dependencies: https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.29.0#dependencies-body-tab

image

And then looking at 3.35.1:

image

There are no changes, which means, upgrading the SDK cannot be impacting your application's environment in any way. One possibility normally is that if you upgrade a library, it might be bringing new dependencies that are affecting other parts of your app, but this is not the case.

maybe some socket error

No, this error is clear. A client application is calling a Controller Action and passing some JSON payload. The JSON payload that is being sent from the client cannot be bound to the Model you are specifying in the Action. Could you be upgrading other things?

See https://github.com/dotnet/aspnetcore/issues/23949

waszakCeneo commented 5 months ago

I have already seen #23949 year ago, this was the issue we started with investigation

Then after testing changes one by one we ended up with cosmosdb package. And every time, simple PR with one single line changed of cosmosdb started making this errors.

First we gonna try to update from 3.29 to 3.31.2.

Our hypothesis was that it could be due to Azure.Table or some other library.

After update. We gona leave for few days but we did already week long test with 3.35 image image

ealsur commented 5 months ago

And every time, simple PR with one single line changed of cosmosdb started making this errors

Are the errors constant? Or temporary? Based on the other Issue, the problem is the client, either stopping the request or sending a badly formatted payload. Are you changing anything on the client application that calls your ASP.NET Controllers?

waszakCeneo commented 5 months ago

There is no change except for package version. We updated only this dependency with Renovate. Errors are constant but every retry succceeds. We hit daily cap so we gonna rollback again. Sadly because I don't have method to reproduce this on smaller case yet.

Failure rate is 25k/3M requests. (The method as i said before doesn't even query cosmosdb but azure table) (Because of filter real number could be close to 100k/12M request)

image

ealsur commented 5 months ago

EDIT: Re-read and edited

Is the client a browser? Who sends the payload? Could the client be performing another Action before calling this one that is failing? Is this the only Action failing with this error?

microsoft-github-policy-service[bot] commented 4 months ago

@waszakCeneo this issue requires more information for the team to be able to help. In case this information is available, please add it and re-open the Issue.