Open xinlian12 opened 1 year ago
@mkrueger - assigning this one to you. Iain has a similar issue for .Net. We can discuss in our next call.
Bottom-line: DedicatedCache is caching query results - and when nothing changes in a query returns the previously cached results. For Change Feed Processor the lease documents are kept in a Cosmos DB container - the load distribution of the ChangeFeedProcessor instances depends on queries against the lease container - these queries should be configured to side-step the dedicated cache - via setting MaxIntegratedCacheStaleness=0 or BypassIntegratedCache(https://github.com/Azure/azure-cosmos-dotnet-v3/pull/3836)
In java similar issue exists for the ThroughputControl metadata container.
Issue: When using ChangeFeedProcessor with Dedicated gateway, LeaseLostException is being constantly observed.
Reason: For Dedicated gateway, the default MaxIntegratedCacheStaleness is 5 minutes, and for cached query, single item update will not evict the cached query results. During load balancing time, staled/cached lease snapshot will be returned and due to the staled timestamp, leases which are actively being processed may mistakenly categorized as expired leases, and the taking ownership process may fail due to etag mistach.
Proposed solution: using MaxIntegratedCacheStaleness=0 or BypassIntegratedCache(https://github.com/Azure/azure-cosmos-dotnet-v3/pull/3836)