Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
731 stars 489 forks source link

For Multi Master account Cosmsodb calls are going to Global endpoint #3614

Open sourabh1007 opened 1 year ago

sourabh1007 commented 1 year ago

We are continuously addressing and improving the SDK, if possible, make sure the problem persist in the latest SDK version.

Describe the bug It is noticed that all cosmos db calls are going to global endpoint always instead of regional endpoint for multi master accounts without Preferred Region and application region configured.

here is the diagnostics: request-diagnostics-mm.txt

To Reproduce Make few read/write calls on multi master account.

Expected behavior Calls should go to regional endpoint instead of global endpoint and failover to other regional endpoint if case of primary region down.

Environment summary SDK Version: Latest OS Version (e.g. Windows, Linux, MacOSX) NA

ealsur commented 1 year ago

Reference:

https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs#L524-L558

Java has the same behavior:

https://github.com/Azure/azure-sdk-for-java/blob/e361ba327dfd5956082f8ec59ceff719c33bdf85/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/LocationCache.java#L441-L474

ealsur commented 1 year ago

In Single Master accounts nextLocationInfo.WriteEndpoints = this.GetPreferredAvailableEndpoints(nextLocationInfo.AvailableWriteEndpointByLocation, nextLocationInfo.AvailableWriteLocations, OperationType.Write, this.defaultEndpoint); returns the first Write Endpoint, so the next line nextLocationInfo.ReadEndpoints = this.GetPreferredAvailableEndpoints(nextLocationInfo.AvailableReadEndpointByLocation, nextLocationInfo.AvailableReadLocations, OperationType.Read, nextLocationInfo.WriteEndpoints[0]); uses as fallbackEndpoint the first Write endpoint, both results are regional endpoints.

For Multi Master, nextLocationInfo.WriteEndpoints = this.GetPreferredAvailableEndpoints(nextLocationInfo.AvailableWriteEndpointByLocation, nextLocationInfo.AvailableWriteLocations, OperationType.Write, this.defaultEndpoint); returns this.defaultEndpoint which is the global DNS, and the subsequent call nextLocationInfo.ReadEndpoints = this.GetPreferredAvailableEndpoints(nextLocationInfo.AvailableReadEndpointByLocation, nextLocationInfo.AvailableReadLocations, OperationType.Read, nextLocationInfo.WriteEndpoints[0]); uses as fallbackEndpoint the same region returned by the first (global DNS).

ealsur commented 1 year ago

Going to the Global DNS equals going to the first region because it points to the same IP as the first region. But it is inconsistent with Single Master and produces an empty Contacted Regions information.

Probably Multi Master account users are more frequently using ApplicationPreferredRegions/ApplicationRegion than Single Master, but this looks like something that needs addressing.

sourabh1007 commented 1 year ago

Related PR, tried to solve this issue last year but didn't get approval. for reference: https://github.com/Azure/azure-cosmos-dotnet-v3/pull/2894

ealsur commented 1 year ago

The proposed:

if ((this.CanUseMultipleWriteLocations() || expectedAvailableOperation.HasFlag(OperationType.Read)) &&
                    currentLocationInfo.PreferredLocations != null && 
                    currentLocationInfo.PreferredLocations.Count > 0)

Sounds like it would solve the problem

sourabh1007 commented 1 year ago

cc @kirankumarkolli @FabianMeiswinkel