IEvangelist / azure-cosmos-dotnet-repository

Wraps the .NET SDK for Azure Cosmos DB abstracting away the complexity, exposing a simple CRUD-based repository pattern
https://ievangelist.github.io/azure-cosmos-dotnet-repository
MIT License
311 stars 92 forks source link

Inconsistent Page Size Handling with Large Documents #447

Open Skylertodd opened 5 months ago

Skylertodd commented 5 months ago

Apparent Behavior: Continuation token skipping records when fetching page 2.

When setting a larger page size and dealing with large documents, there is an edge case where Cosmos DB returns fewer items than the desired page size due to size limitations (believed to be 4MB). This results in an issue where the iterator fetches another page of items, causing the total fetched items to exceed the desired amount. The continuation token reflects this higher number, but when iterating through the results, extra records are discarded to match the exact requested count. Consequently, when fetching subsequent pages, records that were discarded are skipped.

Steps to Reproduce / Example Scenario:

  1. Set a large page size (e.g., 50 items).
  2. Fetch documents where the combined size might exceed Cosmos DB's response size limit (4MB).
  3. Observe that Cosmos DB may return fewer items than requested (e.g., 43 instead of 50).
  4. The iterator fetches additional items (e.g., another 50), resulting in a total that exceeds the desired count.
  5. Discard extra items to match the requested count (e.g., discard 34 items to return exactly 50).
  6. The continuation token reflects the total fetched items (e.g., 84), causing subsequent fetches to skip the discarded items.

Expected Behavior / Recommendation:

  1. Drain the entire iterator and potentially return more items than requested.
  2. Dynamically set MaxItemCount based on the remaining number of records needed after continuing.

Reference Code: DefaultRepository.cs#L50

IEvangelist commented 5 months ago

That's a really interesting edge case, thank you very much for reporting here! Is this something you'd be interested in attempting to fix? I'd be more than happy to review a PR. Just let me know, thanks again @Skylertodd for such a thorough bug report.

Skylertodd commented 5 months ago

When I was initially investigating, I thought it would be easy to dynamically set page size based on remaining records... however it appears that the feedIterator is immutable and it would require a bit of a refactor to separate out the continuation token, query, and paging information. If I find some time, I might toss something together, for now I decided to work around it by lowering the count of items being returned.