Closed sutiv closed 3 years ago
Hi @sutiv pagination is implemented but It appears to be using default config which essentially means no pagination. I'll add a parameter for pagination details.
thanks a lot!
Almost perfect, but isn´t there something missing?! Something like next_token = page.get('NextToken')
.
paginator.paginate() will return a 'NextToken', if a 'StartingToken' was given AND the page isn´t the last page. This NextToken isn´t returned by your solution and thereby I can´t call the next page.
Actually current implementation lists through all pages and collects the results so you don't have to:
for page in paginator.paginate(QueryString=sql, PaginationConfig=pagination_config or {}):
if not schema:
schema = _process_schema(page=page)
for row in page["Rows"]:
rows.append(_process_row(schema=schema, row=row))
Although now that I think about it, it would be useful to be able to iterate through pages in case result set is too big. I'll add this.
Almost perfect, but isn´t there something missing?! Something like
next_token = page.get('NextToken')
.paginator.paginate() will return a 'NextToken', if a 'StartingToken' was given AND the page isn´t the last page. This NextToken isn´t returned by your solution and thereby I can´t call the next page.
There is still a problem with the current implementation. 'NextToken' is still not part of the return value. What's the point of adding support for pagination when you don't return the pagination token?
@jeffngo you don't have to retrieve the next page manually using a token. Pass chunked=True
and wrangler will return an iterator of data frames each corresponding to the pages in the result set that you would be able to iterate lazily.
@kukushking Does that mean that awswrangler
returns the full result set along with an iterator? If that's true, this solution will not scale well for a large dataset. AWS Timestream has implemented token-based pagination so that end-users can fetch a smaller subset of the full dataset within each request, and use NextToken
in the next request to fetch the next page of results. Is there any chance we can return the NextToken
as part of the response?
@jeffngo no, if you pass chunked=True
it will not read full result set at once - it only retrieves the current page, until you ask the iterator for the next one.
dfs = wr.timestream.query(sq="...", chunked=True) # returns an iterator, does not retrieve any results
for df in dfs:
print(df) # retrieves and returns the df for the current page only
Just make sure you pass chunked=True
to enable this behavior, otherwise it will indeed retrieve full result set.
@kukushking I see. In my application, we want to return a pagination token to the client so that the client can decide when to go to the next/previous page. Is there a way to pull the next pagination token out of dfs
in your example above?
Queries can take a long time and the AWS API gateway times out.
Is there a possibility to use pagination as offered by boto3? https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/timestream-query.html#TimestreamQuery.Client.get_paginator