Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.53k stars 2.76k forks source link

max_item_count breaks pagination in azure-cosmos query_items().by_page() #36946

Open esotuvaka opened 3 weeks ago

esotuvaka commented 3 weeks ago

Describe the bug I've adapted the code examples below directly from the python-azure-sdk samples section of the github repo: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/cosmos/azure-cosmos/samples/document_management.py#L88

Part 1: Specifying a max_item_count greater than the number of documents present in cosmos DB returns 0 documents, despite there being documents present in Cosmos DB. My current workaround has been to set the max_item_count to 1, which does enable pagination, but will not scale. Part 2: The other half to this bug is when we paginate, if the final page is less than max_item_count, the page will return back as empty.

Part 1: For example, having 7 documents in Cosmos DB with a max_item_count of 10 can not be paginated by the code below, because the first page is returned with 0 documents. Part 2: For example, having 7 documents in Cosmos DB with a max_item_count of 2 can be paginated by the code below, but the final page that should return 1 document actually returns [] no documents, I suspect also because of Part 1's bug where the max_item_count is now larger than the number of documents remaining.

Specifying no max_item_count will also trigger this bug.

To Reproduce Steps to reproduce the behavior: PART 1:

# Currently have 7 documents in Cosmos DB, partitioned by /id
# This will _not_ paginate, and returns [[], None] on the first function call / request
def paginated_query_all_items(
    container: ContainerProxy, continuation_token: str | None
) -> tuple[list | None, str | None]:
        query_iterable = container.query_items(
                query="SELECT * FROM r",
                enable_cross_partition_query=True,
                max_item_count=10, # Note the change in max_item_count
        )

        if continuation_token is None:
                logger.debug(f"Querying for the first page of items")
                item_pages = query_iterable.by_page()
                first_page = item_pages.next()
                cont_token = item_pages.continuation_token
                return first_page, cont_token
        else:
            # Now we use the continuation token from the first page to pick up where we left off and
            # access the next page of items
            try:
                items_from_continuation = query_iterable.by_page(continuation_token)
                nth_page_items_with_continuation = list(items_from_continuation.next())
                cont_token = items_from_continuation.continuation_token

                return nth_page_items_with_continuation, cont_token
            except StopIteration:
                return [], None

PART 2:

# Currently have 7 documents in Cosmos DB, partitioned by /id
# This will paginate, but will fail on page 4, where it returns [[], None] indicating no more pages, but has no content rather than the 1 remaining item
def paginated_query_all_items(
    container: ContainerProxy, continuation_token: str | None
) -> tuple[list | None, str | None]:
        query_iterable = container.query_items(
                query="SELECT * FROM r",
                enable_cross_partition_query=True,
                max_item_count=2, # Note the change in max_item_count
        )

        if continuation_token is None:
                logger.debug(f"Querying for the first page of items")
                item_pages = query_iterable.by_page()
                first_page = item_pages.next()
                cont_token = item_pages.continuation_token
                return first_page, cont_token
        else:
            # Now we use the continuation token from the first page to pick up where we left off and
            # access the next page of items
            try:
                items_from_continuation = query_iterable.by_page(continuation_token)
                nth_page_items_with_continuation = list(items_from_continuation.next())
                cont_token = items_from_continuation.continuation_token

                return nth_page_items_with_continuation, cont_token
            except StopIteration:
                return [], None

Expected behavior I expected that max_item_count would return up to the max_item_count number of documents. If the documentation could be adapted to show a working example that would be great, as in my research online I've found limited / incomplete resources for Cosmos DB Python SDK paginated requests. Thank you

Additional context N/A

github-actions[bot] commented 3 weeks ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @AbhinavTrips @bambriz @pilchie @pjohari-ms @simorenoh.

bambriz commented 4 days ago

Hello @esotuvaka Thank you for bringing this to our attention. I have not been able to reproduce the bug mentioned here. I ran the two scenarios you mentioned and got the expected behaviour when using max item count. How are you using the methods mentioned in your code? This is what I have:

paginated_queried_items = paginated_query_all_items(container, continuation_token=None)
continuation_token = list(paginated_queried_items)[1]
while continuation_token is not None:
    paginated_queried_items = paginated_query_all_items(container, continuation_token=continuation_token)
    continuation_token = list(paginated_queried_items)[1]

With the 7 items in the container I get 7 items back in one pagination when max_item_count is 10 and 7 through 4 paginations when it is 2.