dataops-tk / tap-powerbi-metadata

tap-powerbi-metadata
MIT License
5 stars 2 forks source link

Need design pattern for when API requires tight date range that needs to be iterated over #5

Closed jtimeus-slalom closed 3 years ago

jtimeus-slalom commented 3 years ago

Power BI Admin - Get Activity Events endpoint has the limitation "StartDateTime and EndDateTime must be in the same UTC day."

https://docs.microsoft.com/en-us/rest/api/power-bi/admin/getactivityevents

aaronsteers commented 3 years ago

After thinking on this more, it should be doable without changes to the current spec:

@jtimeus-slalom - Lmk if this approach makes sense...

aaronsteers commented 3 years ago

After reviewing the challenge regarding iteratively and smartly looping through multiple calls, I realized that get_url_params was unnecessarily resetting everything to original params, only to then be overriden again by insert_next_page_token, as the code snippet from prepare_request shows:

        params: dict = self.get_url_params(partition)
        params = self.insert_next_page_token(
            next_page=next_page_token, params=params
        )

Instead, I think our code will be simpler and easier to understand if we combine get_url_params with insert_next_page_token by allowing get_url_params to take the optional next_page_token along with the partition arg. Then we can smartly set the needed params one time based on the combination of factors: partition, state, and next_page_token.

Similarly, prepare_request_payload should also take next_page_token in case the payload itself should be updated with information from the paging scheme (as is likely to be the case for GraphQL sources which pass the query in the payload body instead of url params.

I'm working to combine these two methods and I'll post back when that's in place and ready for review.

aaronsteers commented 3 years ago

MR with latest updates here: https://gitlab.com/meltano/singer-sdk/-/merge_requests/9

aaronsteers commented 3 years ago

The next issue I'm running into is that get_next_page_token() doesn't take a partition argument so it can't access state - and since the Power BI REST API will only accept either the start date or the continuation token, we can't scrape the start date from the request after we've started paging through rows.

My proposed solution to this is that we also add a prev_token to the function signature - and since the way token is used is totally up to the developer, we can easily then pass a dict as the token which tracks the last start date which we sent to the API. I think there are other use cases where knowing the previous token value could be super helpful - such as detecting infinite loops and keeping track of other metadata about the needed recursion behavior.

aaronsteers commented 3 years ago

Updated sample for get_next_page_token, now with access to previous_token:

    def get_next_page_token(self, response: requests.Response, previous_token: Optional[Any] = None) -> Optional[Any]:
        """Return token for identifying next page or None if not applicable."""
        next_page_token = response.headers.get("X-Next-Page", None)
        if next_page_token:
            self.logger.info(f"Next page token retrieved: {next_page_token}")
        if next_page_token and next_page_token == previous_token:
            raise RuntimeError(
                f"Loop detected in pagination. Pagination token {next_page_token} is identical to previous run."
            )
        return next_page_token
jtimeus-slalom commented 3 years ago

We did it! image