boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
9.08k stars 1.87k forks source link

dynamodb paginators return continuation token with undocumented name, can't start from continuation token #3693

Open mdavis-xyz opened 1 year ago

mdavis-xyz commented 1 year ago

tldr: The pagination token returned by dynamodb paginators doesn't match the documentation, and cannot be passed in as a starting point for pagination.

Docs Issue

The docs for dynamodb paginators say each page contains NextToken. But it's actually LastEvaluatedKey.

Similar to #3677 and #1664

Steps to reproduce:

import boto3
import datetime as dt

table_name = 'test-token'

client = boto3.client('dynamodb')

try:
    client.create_table(
        AttributeDefinitions=[
            {
                'AttributeName': 'h',
                'AttributeType': 'N'
            },
        ],
        TableName=table_name,
        KeySchema=[
            {
                'AttributeName': 'h',
                'KeyType': 'HASH'
            },
        ],
        BillingMode='PAY_PER_REQUEST',
        Tags=[
            {
                'Key': 'delete after',
                'Value': str(dt.date.today())
            },
            {
                'Key': 'person',
                'Value': 'matt'
            }
        ],
    )
except client.exceptions.ResourceInUseException:
    pass

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(table_name)

print("Waiting for table to exist")
table.wait_until_exists()

page_size=2
num_items = page_size*3+1

print("Writing data")
for i in range(num_items):
    table.put_item(
        Item={
            'h': i
        },
    )

print("Reading data")
paginator = client.get_paginator('scan')
response_iterator = paginator.paginate(
    TableName=table_name,
    PaginationConfig={
        'PageSize': page_size,
    }
)

for (i, page) in enumerate(response_iterator):
    print(page.keys())
    print(f"Page {i} has {len(page.get('Items', []))} items, NextToken={page.get('NextToken')}")
$ python3 main.py 
Waiting for table to exist
Writing data
Reading data
dict_keys(['Items', 'Count', 'ScannedCount', 'LastEvaluatedKey', 'ResponseMetadata'])
Page 0 has 2 items, NextToken=None
dict_keys(['Items', 'Count', 'ScannedCount', 'LastEvaluatedKey', 'ResponseMetadata'])
Page 1 has 2 items, NextToken=None
dict_keys(['Items', 'Count', 'ScannedCount', 'LastEvaluatedKey', 'ResponseMetadata'])
Page 2 has 2 items, NextToken=None
dict_keys(['Items', 'Count', 'ScannedCount', 'ResponseMetadata'])
Page 3 has 1 items, NextToken=None

Code Issue

If you try to take this LastEvaluatedKey and pass it to the StartingToken field of the paginator, you get an error. Because LastEvaluatedKey is not a string, it's a dict.

import boto3
import datetime as dt
from pprint import pprint

table_name = 'test-token'

client = boto3.client('dynamodb')

try:
    client.create_table(
        AttributeDefinitions=[
            {
                'AttributeName': 'h',
                'AttributeType': 'N'
            },
        ],
        TableName=table_name,
        KeySchema=[
            {
                'AttributeName': 'h',
                'KeyType': 'HASH'
            },
        ],
        BillingMode='PAY_PER_REQUEST',
        Tags=[
            {
                'Key': 'delete after',
                'Value': str(dt.date.today())
            },
            {
                'Key': 'person',
                'Value': 'matt'
            }
        ],
    )
except client.exceptions.ResourceInUseException:
    pass

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(table_name)

print("Waiting for table to exist")
table.wait_until_exists()

page_size=2
num_items = page_size*3+1

print("Writing data")
for i in range(num_items):
    table.put_item(
        Item={
            'h': i
        },
    )

print("Reading data")
paginator = client.get_paginator('scan')
response_iterator = paginator.paginate(
    TableName=table_name,
    PaginationConfig={
        'PageSize': page_size,
        'StartingToken': {'h': {'N': '1'}}
    }
)

for (i, page) in enumerate(response_iterator):
    pprint(page)
$ python3 main.py 
Waiting for table to exist
Writing data
Reading data
Traceback (most recent call last):
  File "main.py", line 66, in <module>
    for (i, page) in enumerate(response_iterator):
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/botocore/paginate.py", line 261, in __iter__
    next_token = self._parse_starting_token()[0]
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/botocore/paginate.py", line 533, in _parse_starting_token
    next_token = self._token_decoder.decode(next_token)
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/botocore/paginate.py", line 124, in decode
    json_string = base64.b64decode(token.encode('utf-8')).decode('utf-8')
AttributeError: 'dict' object has no attribute 'encode'

If I add json.dumps() to the key, I get:

Traceback (most recent call last):
  File "main.py", line 67, in <module>
    for (i, page) in enumerate(response_iterator):
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/botocore/paginate.py", line 269, in __iter__
    response = self._make_request(current_kwargs)
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/botocore/paginate.py", line 357, in _make_request
    return self._method(**current_kwargs)
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/botocore/client.py", line 919, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/botocore/client.py", line 990, in _convert_to_request_dict
    request_dict = self._serializer.serialize_to_request(
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/botocore/validate.py", line 381, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter ExclusiveStartKey, value: {"h": {"N": "1"}}, type: <class 'str'>, valid types: <class 'dict'>

Links

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/paginator/Scan.html

tim-finnigan commented 1 year ago

Thanks @mdavis-xyz for reporting this issue. As mentioned in https://github.com/boto/boto3/issues/3677 there are inconsistencies with how continuation tokens are named across services, but I agree it's confusing to have NextToken documented when that's not in the response. Maybe it should say something like "(e.g. NexToken, though the name may vary depending on the service)".

Here is the documentation for the scan client method: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/scan.html. As noted there, ExclusiveStartKey is used for LastEvaluatedKeyhere is an example use case. I think it is unusual to have a dict as a next marker as those tokens are generally strings.

I've referenced this issue in https://github.com/boto/boto3/issues/3677 for further review going forward.