MrThearMan / dynamics-client

Client for making Web API request from a Microsoft Dynamics 365 Database
https://pypi.org/project/dynamics-client/
MIT License
10 stars 6 forks source link

FutureRequest: Pagination on parent records to get more than 5000 records. #16

Closed yasaidev closed 1 year ago

yasaidev commented 1 year ago

I have read the docs thoroughly before making this feature request.

I have read through other open issues, and my issue is not a duplicate.

Description

Purpose

Add pagination with V9 API returned value.

Problem

This library can't pagination with v9.x dataverse api returned value likes below with current code.

response = {
    "@odata.context": "https://XXXXX.crm7.dynamics.com/api/data/v9.0/$metadata#accounts",
    "value": [
        {
            "@odata.etag": "W/\"7816473\"",
            "name": "Sample 1",
            "accountid": "d7d7cee5-a8c1-eb11-bacc-000d3a40e0ac"
        },
        /*skip many items...*/
        {
            "@odata.etag": "W/\"7816473\"",
            "name": "Sample 2",
            "accountid": "d7d7cee5-a8c1-eb11-bacc-000d3a40e0ac"
        }
    ],
    "@odata.nextLink": "https://XXXXX.crm7.dynamics.com/api/data/v9.0/accounts?$skiptoken=%3Ccookie%20pagenumber=%222%22%20pagingcookie=%22%253ccookie%2520page%253d%25221%2522%253e%253caccountid%2520last%253d%2522%257bD7D7CEE5-A8C1-EB11-BACC-000D3A40E0AC%257d%2522%2520first%253d%2522%257b93C71621-BD9F-E711-8122-000D3A2BA2EA%257d%2522%2520%252f%253e%253c%252fcookie%253e%22%20istracking=%22False%22%20/%3E"
}

cite from https://www.cdata.com/jp/blog/dynamics365pagenation

It seems _handle_patination has response["value"], so there is no @odata.nextLink. So this library only get max 5000 records with dataverse v9.x api.

Solution

Add some code that getting @odata.nextlink from response["@odata.nextlink"] when call _handle_pagination.

I already make a pull request ,please check it.

Motivation

Add pagination with V9 API returned value. Currently this library only get max 5000 records with dataverse v9.x api.

Would you like to solve this issue yourself with a pull request?

Yes

MrThearMan commented 1 year ago

Hi!

Thanks for the pull request, but it's generally a better idea to discuss the changes here first to prevent unnecessary refactoring.

Indeed, it seems I never implemented pagination properly! The current pagination handles pagination for nested entities (<entity_name>@odata.nextLink links inside the value-key, see here). Missing this indeed prevents fetching more entities than the pagesize (max 5000).

Wisdom comes with age, and I don't really like how I've implemented pagination for this library. Ideally, pagination should be left for the client, since it's meant to prevent over-fetching. Returning pagination data from the client might be a bigger refactor, but we should at least enable the user to fetch more than 5000 rows if needed.

So, both paginations for the value-entity and nested entities should be possible, at the same time. Both paginations should be recursive, so that we get all possible entities back. Having a boolean flag or a max_pages agument on the client get method would be good to time out, but I'll accept a PR without it (since I'd rather refactor the pagination later anyway). There should be tests for these scenarios.

yasaidev commented 1 year ago

Thank you for quick response, and sorry for my late reply. I love this library makes a lot for help me to build some tools without PowerAutomate.

I will refactor my code and test in a couple days, and notify you then completed. But I have a question.

Is the following implementation policy acceptable?

I think this implementation will works even if the target records have both parent @odata.nextLink and nested @odata.nextLink. Please let me know if there is any misunderstanding

  1. Get all parent records with call get parent @odata.nextLink 's odata query recursively as long as @odata.nextLink is not None.
  2. Get all nested @odata.nextLink with current handler
  3. Then Get all parents records and nested records.
def _handle_pagination(self, data, not_found_ok: bool) -> None:
  # NOW HE WILL RECEIVE WHOLE DATA, SO ERROR CHECK WILL BE HERE
  # Always returns a list, even if only one row is selected
  entities: List[Dict[str, Any]] = data.get("value", [data])
  if not entities:
      if not_found_ok:
          return []

      raise self.handled_error(
          status_code=status.HTTP_404_NOT_FOUND,
          error_message="No records matching the given criteria.",
          error_code="not_found",
          method="get",
      )

    parent_next_link: Optional[str] = data.get("@odata.nextLink")
    if parent_next_link is not None:
        #JUST GET NEXT PARENT CODE
        extra = self.get(not_found_ok=not_found_ok, query=parent_next_link)
        entities += extra

    # [ALREADY IMPLEMENTED CODE FOR NESTED NEXT LINK](https://github.com/MrThearMan/dynamics-client/blob/main/dynamics/client.py#L360)
    # SKIP FOR SIMPLIFY
MrThearMan commented 1 year ago

Glad that you are enjoying the library! 😊

Your implementation stategy seems sound to me. Looking forward to it!

MrThearMan commented 1 year ago

Hi! I ended up implementing a complete refactor of how paginated data is handled. The client get/post/patch methds now return dataclasses (with the pagination info included if pagination hasn't reached the last page), and the pagination can be configured separately for the parent and child entities. I still need to update the documentation, but something like this should fetch all parent entities and the child entity named "foo":

client = DynamicsClient(...)
# Setup query...
response = client.get(pagination_rules={"pages": -1, "children": {"foo": {"pages": -1}}})
print(response.data)

Hopefully you didn't put too much time into your solution yet! Please open a pull request if you'd like to see any changes. Released in 0.8.0.

yasaidev commented 1 year ago

@MrThearMan Sorry for my delay of everything. I have super busy with AC6... Thank you for the paging implementation.

I will read some code, and try it.