boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
8.99k stars 1.86k forks source link

No documentation on the very useful 'build_full_result' for Paginators. #3001

Open boldandbusted opened 3 years ago

boldandbusted commented 3 years ago

Please fill out the sections below to help us address your issue.

What issue did you see ?

No documentation on 'build_full_result' for Paginators.

Here's a reference on Stack Exchange that demonstrates the use of 'build_full_result':

https://stackoverflow.com/a/69221258/2808798

Steps to reproduce Check the docs. Experience sadness and disappointment. ;)

Debug logs N/A

Thank you for your attention! This method seems to be much easier than instrumenting the iteration yourself. Apologies in advance if this is supposed to remain an 'undocumented feature'. Apologies also if this is documented somewhere other than the code. Please link-slap me if it is, and close this Issue. Cheers.

kdaily commented 3 years ago

Hi @boldandbusted,

Thanks for your comment. In the general case, the use as an iterrable is the preferred (and documented) method:

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#creating-paginators

import boto3

# Create a client
client = boto3.client('s3', region_name='us-west-2')

# Create a reusable Paginator
paginator = client.get_paginator('list_objects')

# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket='my-bucket')

for page in page_iterator:
    print(page['Contents'])

There are cases where build_full_result is useful, but not in general. You can read more about that on this closed issue on some cases when you are limiting the number of results via MaxItems:

https://github.com/boto/boto3/issues/788#issuecomment-416755090

In terms of documenting this function, I'm not sure how useful it is given the edge case, but I can look into that further. Can you tell me more why you would need to use it?

Thanks!

boldandbusted commented 3 years ago

Hi @boldandbusted,

Thanks for your comment. In the general case, the use as an iterrable is the preferred (and documented) method:

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#creating-paginators

import boto3

# Create a client
client = boto3.client('s3', region_name='us-west-2')

# Create a reusable Paginator
paginator = client.get_paginator('list_objects')

# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket='my-bucket')

for page in page_iterator:
    print(page['Contents'])

There are cases where build_full_result is useful, but not in general. You can read more about that on this closed issue on some cases when you are limiting the number of results via MaxItems:

#788 (comment)

In terms of documenting this function, I'm not sure how useful it is given the edge case, but I can look into that further. Can you tell me more why you would need to use it?

Thanks!

Thank you for your reply, @kdaily . So, the example code shows merely using print() to show the output, page by page. What is more useful for me is creating a data structure in a single variable assignment from the output from a boto3 request (as shown in the Stack Overflow link I included above). The simplest path seems to be via build_full_result, because then I don't need to create any processing logic around handling pagination specifics in order to create that data structure - I merely need to use an bare variable assignment of the output from build_full_result().

I am basically looking to remain extremely lazy, and not have to worry about 'processing' paged result from the API. :) For lack of a better framing, I can say that in my use-case I don't want to have to care about pagination - I'm only after the results of a query, not how the query is sent to the API. This is more in line with the 'batteries included' part of our Python show. :)

I hope this helps and makes sense. Happy to add more explanation. Also, open to hearing if this pattern would actually lead to not getting the correct or expected data back from the API. Certainly in that case this ussage pattern should be explicitly documented as an anti-pattern. :)

Cheers!

AlphaWong commented 5 months ago

I think it is better to show a reference or a keyword for the function build_full_result at https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html

It is up to the developer to choose which one fits their business case instead of getting the info from Stack overflow

I suggest that we add some lines from https://github.com/boto/boto3/blob/develop/docs/source/guide/paginators.rst to hint the developer to check the source code and let them think about whether it is a good fit or not.

source code

https://github.com/boto/botocore/blob/master/botocore/paginate.py#L477

test case

https://github.com/boto/botocore/blob/master/tests/unit/test_paginate.py#L818

Et7f3 commented 1 month ago

I'm not sure how useful it is given the edge case

I though it was the opposite: most app aren't critical to need pagination in many case, many dev are lazy and just want full results. Even for partial list I think jsmpath is better suited because we can specify start item (via slicing, it wasn't outlined in the documentation so I don't know if it is an anti-pattern).

It is up to the developer to choose which one fits their business case instead of getting the info from Stack overflow

A rule of thumb I see is: if end user want pagination then iterate on paginator.paginate() otherwise call .paginate().build_full_result()

tarrc commented 1 week ago

@Et7f3 in my usecase I also do not need pagination to pull a specific page of results - but rather need to export the entirety of the pool to an external CRM for matching to existing customers. We will match and throw out duplicates - but also from those duplicates check if an attribute has changed in Cognito - and reflect that change in CRM.

For my specific need .paginate().build_full_result() is the smartest option. Otherwise I am forced to write additional business logic to paginate through the results and add unnecessary calls to both endpoints.