The pagination of S3 list_objects_v2 skip pages when using CommondPrefixes (i.e. Delimiter) and StartingToken
Use case:
Our API provides a list of S3 "folders" and supports pagination. It is a wrapper over our internal S3 bucket and forwards the information. The first response of the API returns a list of common prefixes and the next token provided by the PageIterator. The second request uses this token to continue the listing.
Expected Behavior
Using the paginator.paginate() method with the Delimiter parameter and not setting StartingToken should return all pages starting from the first one and its next token.
Using it again but this time with a given StartingToken (the first page next token) should return all pages starting from the second one and its next token.
Current Behavior
When the paginator.paginate() is called with StartingToken it returns the second page with an empty CommonPrefixes list but the third with a valid CommonPrefixes list
Reproduction Steps
You need a bucket with date partitions and files in them.
I followed the issue down to PageIterator.__iter__() (.venv/lib/python3.11/site-packages/botocore/paginate.py)
if first_request:
# The first request is handled differently. We could
# possibly have a resume/starting token that tells us where
# to index into the retrieved page.
if self._starting_token is not None:
starting_truncation = self._handle_first_request(
parsed, primary_result_key, starting_truncation
)
first_request = False
self._record_non_aggregate_key_values(parsed)
The primary_result_key is initiated a few lines before that as self.result_keys[0] and result_keys are essentially coming from a JSON schema from venv/lib/python3.11/site-packages/botocore/data/s3/2006-03-01/paginators-1.json
Describe the bug
The pagination of S3
list_objects_v2
skip pages when usingCommondPrefixes
(i.e.Delimiter
) andStartingToken
Use case: Our API provides a list of S3 "folders" and supports pagination. It is a wrapper over our internal S3 bucket and forwards the information. The first response of the API returns a list of common prefixes and the
next token
provided by thePageIterator
. The second request uses this token to continue the listing.Expected Behavior
Using the
paginator.paginate()
method with theDelimiter
parameter and not settingStartingToken
should return all pages starting from the first one and its next token. Using it again but this time with a givenStartingToken
(the first pagenext token
) should return all pages starting from the second one and its next token.Current Behavior
When the
paginator.paginate()
is called withStartingToken
it returns the second page with an emptyCommonPrefixes
list but the third with a validCommonPrefixes
listReproduction Steps
You need a bucket with date partitions and files in them.
Output:
Possible Solution
No response
Additional Information/Context
I followed the issue down to
PageIterator.__iter__()
(.venv/lib/python3.11/site-packages/botocore/paginate.py
)The
primary_result_key
is initiated a few lines before that asself.result_keys[0]
andresult_keys
are essentially coming from a JSON schema fromvenv/lib/python3.11/site-packages/botocore/data/s3/2006-03-01/paginators-1.json
where
result_key
isContents
which is missing in the S3 response bodyparsed
SDK version used
1.31.17
Environment details (OS name and version, etc.)
MacOS 14.2.1 (23C71)