aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
2.98k stars 560 forks source link

Limit parameter does not work in paginateScan and paginateQuery in @aws-sdk/lib-dynamodb #5952

Closed cm-rwakatsuki closed 3 months ago

cm-rwakatsuki commented 3 months ago

Checkboxes for prior research

Describe the bug

It is possible to specify limit parameters in paginateScan and paginateQuery in @aws-sdk/lib-dynamodb, but it makes no sense. All data from the table is queried or scanned regardless of the Limit value you specify.

SDK version number

@aws-sdk/lib-dynamodb@3.540.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

Node.js 20.x

Reproduction Steps

This is the code that actually causes the problem.

// lib/cdk-sample-stack.sampleFunc.ts
import { paginateScan, DynamoDBDocument } from '@aws-sdk/lib-dynamodb';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';

const SAMPLE_TABLE_NAME =
  process.env.SAMPLE_TABLE_NAME || '';

interface DataItem {
  id: string;
  timestamp: number;
}

const ddbDocClient = DynamoDBDocument.from(
  new DynamoDBClient({
    region: 'ap-northeast-1',
    apiVersion: '2012-08-10',
  })
);

export const handler = async (): Promise<void> => {
  const paginator = paginateScan(
    {
      client: ddbDocClient,
    },
    {
      TableName: SAMPLE_TABLE_NAME,
      Limit: 5, // DOES NOT WORK
    }
  );
  const items: DataItem[] = [];

  for await (const page of paginator) {
    console.log(page.Count);
    items.push(...(page.Items as DataItem[]));
  }

  console.log(items);
};

This is the sample data file src/tableData/table1.csv stored in the table.

d001,1700625273
d001,1699658818
d001,1703858878
d001,1681316462
d001,1695108297
d001,1694674832
d001,1680945699
d001,1701799579
d001,1696271173
d001,1685651084
d002,1706301230
d002,1679314750
d002,1701457171
d002,1685919651
d002,1684091128

Create the Lambad function and DynamoDB table that causes the event using AWS CDK.

// lib/cdk-sample-stack.ts
import {
  aws_lambda,
  aws_logs,
  aws_lambda_nodejs,
  aws_dynamodb,
  aws_s3,
  aws_s3_deployment,
  Stack,
  RemovalPolicy,
  CfnOutput,
} from 'aws-cdk-lib';
import { Construct } from 'constructs';

export class CdkSampleStack extends Stack {
  constructor(scope: Construct, id: string) {
    super(scope, id);

    const bucket = new aws_s3.Bucket(this, 'Bucket', {
      removalPolicy: RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
    });

    new aws_s3_deployment.BucketDeployment(this, 'DeploySampleTableData', {
      sources: [aws_s3_deployment.Source.asset('./src/tableData')],
      destinationBucket: bucket,
    });

    const sampleTable = new aws_dynamodb.Table(this, 'SampleTable', {
      partitionKey: { name: 'id', type: aws_dynamodb.AttributeType.STRING },
      sortKey: { name: 'timestamp', type: aws_dynamodb.AttributeType.NUMBER },
      billingMode: aws_dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: RemovalPolicy.DESTROY,
      importSource: {
        inputFormat: aws_dynamodb.InputFormat.csv({
          delimiter: ',',
          headerList: ['id', 'timestamp'],
        }),
        bucket,
      },
    });

    const sampleFunc = new aws_lambda_nodejs.NodejsFunction(
      this,
      'SampleFunc',
      {
        architecture: aws_lambda.Architecture.ARM_64,
        runtime: aws_lambda.Runtime.NODEJS_20_X,
        logGroup: new aws_logs.LogGroup(this, 'SampleFuncLogGroup', {
          removalPolicy: RemovalPolicy.DESTROY,
        }),
        environment: {
          SAMPLE_TABLE_NAME: sampleTable.tableName,
        },
      }
    );

    sampleTable.grantReadData(sampleFunc);

    new CfnOutput(this, 'SampleFuncName', {
      value: sampleFunc.functionName,
    });
  }
}

Deploy resources with CDK commands.

npx cdk deploy --require-approval never --method=direct

Observed Behavior

When I run the Lambda function built above, all data on the table will be retrieved and output to the logs, regardless of the value specified for the Limit parameter of paginateScan.

15

[
  { id: 'd001', timestamp: 1680945699 },
  { id: 'd001', timestamp: 1681316462 },
  { id: 'd001', timestamp: 1685651084 },
  { id: 'd001', timestamp: 1694674832 },
  { id: 'd001', timestamp: 1695108297 },
  { id: 'd001', timestamp: 1696271173 },
  { id: 'd001', timestamp: 1699658818 },
  { id: 'd001', timestamp: 1700625273 },
  { id: 'd001', timestamp: 1701799579 },
  { id: 'd001', timestamp: 1703858878 },
  { id: 'd002', timestamp: 1679314750 },
  { id: 'd002', timestamp: 1684091128 },
  { id: 'd002', timestamp: 1685919651 },
  { id: 'd002', timestamp: 1701457171 },
  { id: 'd002', timestamp: 1706301230 }
]

Expected Behavior

The maximum number of data specified by the Limit parameter is returned.

Possible Solution

No response

Additional Information/Context

No response

RanVaknin commented 3 months ago

Hi @cm-rwakatsuki ,

When you are using a paginator if you want to specify the max results per page, you need to use the paginator config's pageSize and not Limit from the request parameter as it used for non pagination requests.

 const paginator = paginateScan(
    {
      client: ddbDocClient,
+      pageSize: 1 // functions similarly to Limit
    },
    {
      TableName: SAMPLE_TABLE_NAME,
-      Limit: 1, // DOES NOT WORK
    }

By specifying the pageSize parameter we can see the results more accurately printed one result per page:

Received page with 1 items
[ { id: 'd001', timestamp: 1685651084 } ]
Received page with 1 items
[ { id: 'd002', timestamp: 1684091128 } ]
Received page with 1 items
[ { id: '123', name: 'Test Item' } ]
Received page with 0 items
[]

If you want to limit the total returned results you need to specify the number of pages * the number of results per page = total results returned:

async function scanTableWithLimit() {
    const ddbDocClient = DynamoDBDocumentClient.from(client);

    const paginator = paginateScan(
        {
            client: ddbDocClient,
            pageSize: 1,
        },
        {
            TableName: tableName,

        }
    );
    const LIMIT = 2;
    let count = 0;

    for await (const page of paginator) {
        console.log(`Received page with ${page.Items.length} items`);
        console.log(page.Items);
        if (++count >= LIMIT) {
            break;
        }
    }
}

Will result in a total of 2 results returned (2 pages, 1 result per page):

Received page with 1 items
[ { id: 'd001', timestamp: 1685651084 } ]
Received page with 1 items
[ { id: 'd002', timestamp: 1684091128 } ]

====

Non pagination request with Limit:

async function scanTableWithLimit() {

    try {
        const response = await client.send(new ScanCommand({
          TableName: tableName,
          Limit: 1 
      }));

        console.log(response.Items);
    } catch (error) {
        console.error(error);
    }
}

This will indeed return only 1 result:

[ { id: { S: 'd001' }, timestamp: { N: '1685651084' } } ]

I hope this clarifies things.

All the best, Ran~