Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.36k stars 2k forks source link

[BUG]ChangeFeedProcessor on DedicatedGateway #35146

Open xinlian12 opened 1 year ago

xinlian12 commented 1 year ago

Issue: When using ChangeFeedProcessor with Dedicated gateway, LeaseLostException is being constantly observed.

Reason: For Dedicated gateway, the default MaxIntegratedCacheStaleness is 5 minutes, and for cached query, single item update will not evict the cached query results. During load balancing time, staled/cached lease snapshot will be returned and due to the staled timestamp, leases which are actively being processed may mistakenly categorized as expired leases, and the taking ownership process may fail due to etag mistach.

Proposed solution: using MaxIntegratedCacheStaleness=0 or BypassIntegratedCache(https://github.com/Azure/azure-cosmos-dotnet-v3/pull/3836)

FabianMeiswinkel commented 1 year ago

@mkrueger - assigning this one to you. Iain has a similar issue for .Net. We can discuss in our next call.

Bottom-line: DedicatedCache is caching query results - and when nothing changes in a query returns the previously cached results. For Change Feed Processor the lease documents are kept in a Cosmos DB container - the load distribution of the ChangeFeedProcessor instances depends on queries against the lease container - these queries should be configured to side-step the dedicated cache - via setting MaxIntegratedCacheStaleness=0 or BypassIntegratedCache(https://github.com/Azure/azure-cosmos-dotnet-v3/pull/3836)

In java similar issue exists for the ThroughputControl metadata container.