Closed jtimeus-slalom closed 3 years ago
After thinking on this more, it should be doable without changes to the current spec:
continuationToken
directly as the next_page_token
, you can create a dict signifying the continuationToken
and also the relevant information regarding the desired date looping behavior.get_next_page_token()
to return {"continuationToken": "asdf1234...", "remaining_dates_to_query": ["2020-12-23", "2020-12-24", ...]}
with all dates populated up to a specified end_date
or today's date.
continuationToken
returns None
, you would pop()
another date from the list and inject that date into the params
object as the new value for startDatetime
.get_next_page_token()
would only return None
, which is the trigger for the SDK to stop looping, when continuationToken
returns None
and also the remaining_dates_to_query
list is empty.@jtimeus-slalom - Lmk if this approach makes sense...
After reviewing the challenge regarding iteratively and smartly looping through multiple calls, I realized that get_url_params
was unnecessarily resetting everything to original params, only to then be overriden again by insert_next_page_token
, as the code snippet from prepare_request
shows:
params: dict = self.get_url_params(partition)
params = self.insert_next_page_token(
next_page=next_page_token, params=params
)
Instead, I think our code will be simpler and easier to understand if we combine get_url_params
with insert_next_page_token
by allowing get_url_params
to take the optional next_page_token
along with the partition
arg. Then we can smartly set the needed params one time based on the combination of factors: partition, state, and next_page_token.
Similarly, prepare_request_payload
should also take next_page_token
in case the payload itself should be updated with information from the paging scheme (as is likely to be the case for GraphQL sources which pass the query in the payload body instead of url params.
I'm working to combine these two methods and I'll post back when that's in place and ready for review.
MR with latest updates here: https://gitlab.com/meltano/singer-sdk/-/merge_requests/9
The next issue I'm running into is that get_next_page_token()
doesn't take a partition
argument so it can't access state
- and since the Power BI REST API will only accept either the start date or the continuation token, we can't scrape the start date from the request after we've started paging through rows.
My proposed solution to this is that we also add a prev_token
to the function signature - and since the way token is used is totally up to the developer, we can easily then pass a dict
as the token which tracks the last start date which we sent to the API. I think there are other use cases where knowing the previous token value could be super helpful - such as detecting infinite loops and keeping track of other metadata about the needed recursion behavior.
Updated sample for get_next_page_token
, now with access to previous_token
:
def get_next_page_token(self, response: requests.Response, previous_token: Optional[Any] = None) -> Optional[Any]:
"""Return token for identifying next page or None if not applicable."""
next_page_token = response.headers.get("X-Next-Page", None)
if next_page_token:
self.logger.info(f"Next page token retrieved: {next_page_token}")
if next_page_token and next_page_token == previous_token:
raise RuntimeError(
f"Loop detected in pagination. Pagination token {next_page_token} is identical to previous run."
)
return next_page_token
We did it!
Power BI Admin - Get Activity Events endpoint has the limitation "StartDateTime and EndDateTime must be in the same UTC day."
https://docs.microsoft.com/en-us/rest/api/power-bi/admin/getactivityevents