boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.
Apache License 2.0
1.5k stars 1.09k forks source link

DynamoDB Query Paginator incorrect count #2257

Open seittema opened 3 years ago

seittema commented 3 years ago

Describe the bug When using a query paginator for dynamodb, it appears that it returns invalid count data if you pass in a StartingToken. We have a process that runs on lambda and for very large data sets we return the LastEvaluatedKey so we can make a subsequent request and continue on. It appears that when you make a pagination call and pass in the StartingToken from a previous request, the first calls Count and ScannedCount are 0 rather than the correct count from the actual call.

Steps to reproduce Here is a sample script showing the invalid data being returned. Load a DynamoTable with at least 4 records with the same PK but different sort keys. The first call with no StartingToken prints the correct 2 counts, but the second call when passing in the previous LastEvaluatedKey returns 0 count even though 2 items are returned.

from botocore.paginate import TokenEncoder
session = botocore.session.Session(profile='test-profile')
dynamo_client = session.create_client('dynamodb')
paginator = dynamo_client.get_paginator('query')

def paginator_test(**kwargs):
    for page in paginator.paginate(**kwargs):
        print(f"ItemCount: {page['Count']} ScannedCount: {page['ScannedCount']} ItemLength: {len(page['Items'])}")
        return page.get('LastEvaluatedKey')

table = 'TestTable'

key_value = '02eb4827-5d9f-41db-86f6-792776ac0f08'
kwargs = {
       'TableName': table,
       'KeyConditionExpression': "PK = :primary_key",
       'ExpressionAttributeValues': {
           ":primary_key": {
               "S": key_value}},
       'PaginationConfig': {'PageSize': 2}
   }

# will print 2 items scanned and count
pagination_token = paginator_test(**kwargs)
kwargs['PaginationConfig']['StartingToken'] = TokenEncoder().encode({"ExclusiveStartKey": pagination_token})
# should print 2 items as well, but prints 0
paginator_test(**kwargs)

Output: ItemCount: 2 ScannedCount: 2 ItemLength: 2 ItemCount: 0 ScannedCount: 0 ItemLength: 2

Expected behavior I expect the Count and ScannedCount to accurately reflect what DynamoDB is returning. It appears to be something in the paginate _handle_first_request function since the parsed data has the correct values before being passed into this function where it then gets reset to 0

swetashre commented 3 years ago

@seittema - Thank you for your post. I am able to reproduce the issue. In the handle_first_request function we are assigning all value to empty except the primary result key(In this case its Items). https://github.com/boto/botocore/blob/develop/botocore/paginate.py#L384 That's why we are getting count as 0. Marking this as bug.