aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.5k stars 3.84k forks source link

(glue-alpha): cannot create 2 partitionIndexes simultaneously #24813

Open clueleaf opened 1 year ago

clueleaf commented 1 year ago

Describe the bug

When passing 2 indexes to partitionIndexes of glue.Table, table creation fails.

Expected Behavior

Glue table and indexes are created.

Current Behavior

Table indexes creation fails.

Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table.

Reproduction Steps

Create a glue table with 2 indexes.

const bucket = new s3.Bucket(stack, 'DataBucket');
const database = new glue.Database(stack, 'MyDatabase', {
  databaseName: 'database',
});

const csvTable = new glue.Table(stack, 'CSVTable', {
  database,
  bucket,
  tableName: 'csv_table',
  columns: [
    { name: 'col1', type: glue.Schema.STRING },
    { name: 'col2', type: glue.Schema.STRING },
    { name: 'col3', type: glue.Schema.STRING },
  ],
  partitionKeys: [
    { name: 'year', type: glue.Schema.SMALL_INT },
    { name: 'month', type: glue.Schema.BIG_INT },
  ],
  partitionIndexes: [
    { indexName: 'index1', keyNames: ['month'] },
    { indexName: 'index2', keyNames: ['month', 'year'] },
  ],
  dataFormat: glue.DataFormat.CSV,
});

It fails sometimes even if only one index is passed to partitionIndexes and the rest is added using table.addPartitionIndex.

const csvTable = new glue.Table(stack, 'CSVTable', {
  database,
  bucket,
  tableName: 'csv_table',
  columns: [
    { name: 'col1', type: glue.Schema.STRING },
    { name: 'col2', type: glue.Schema.STRING },
    { name: 'col3', type: glue.Schema.STRING },
  ],
  partitionKeys: [
    { name: 'year', type: glue.Schema.SMALL_INT },
    { name: 'month', type: glue.Schema.BIG_INT },
  ],
  partitionIndexes: [{ indexName: 'index1', keyNames: ['month'] }],
  dataFormat: glue.DataFormat.CSV,
});

csvTable.addPartitionIndex({ indexName: 'index2', keyNames: ['month', 'year'] })

Possible Solution

I think this a restriction of Glue service.

Additional Information/Context

No response

CDK CLI Version

2.70.0

Framework Version

No response

Node.js Version

18

OS

macOS Ventura

Language

Typescript

Language Version

No response

Other information

No response

khushail commented 1 year ago

Hi @clueleaf , thanks for reaching out.

Its stated in the available documentation that you can have a maximum of 3 partition indexes in the table. But its also stated here - `

We also use +1s to help prioritize our work, and are happy to re-evaluate this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for re-prioritization. (edited)

clueleaf commented 1 year ago

@khushail Thank you for your investigation. One wired thing is that even if I use addPartitionIndex to add index later on, it fails just as the same. It's hard to tell why it succeeds sometimes but not always.

const bucket = new s3.Bucket(stack, 'DataBucket');
const database = new glue.Database(stack, 'MyDatabase', {
  databaseName: 'database',
});

const csvTable = new glue.Table(stack, 'CSVTable', {
  database,
  bucket,
  tableName: 'csv_table',
  columns: [
    { name: 'col1', type: glue.Schema.STRING },
    { name: 'col2', type: glue.Schema.STRING },
    { name: 'col3', type: glue.Schema.STRING },
  ],
  partitionKeys: [
    { name: 'year', type: glue.Schema.SMALL_INT },
    { name: 'month', type: glue.Schema.BIG_INT },
  ],
  partitionIndexes: [{ indexName: 'index1', keyNames: ['month'] }],
  dataFormat: glue.DataFormat.CSV,
});
csvTable.addPartitionIndex({ indexName: 'index2', keyNames: ['month', 'year'] })
khushail commented 1 year ago

@clueleaf , could you please share the error that you see when it fails. As I am not able to repro this error, it might be helpful for reference while creating a PR.

clueleaf commented 1 year ago

Sure.

**:**:** ** | CREATE_FAILED        | Custom::AWS           | CSVTablepartitionindexindex16247ABF6
Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)

 ❌  MyStack (MyStack) failed: Error: The stack named MyStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)
    at FullCloudFormationDeployment.monitorDeployment (/Users/***/node_modules/aws-cdk/lib/index.js:380:10236)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async deployStack2 (/Users/***/node_modules/aws-cdk/lib/index.js:383:145458)
    at async /Users/***/node_modules/aws-cdk/lib/index.js:383:128776
    at async run (/Users/***/node_modules/aws-cdk/lib/index.js:383:126782)

 ❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named MyStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)
    at deployStacks (/Users/***/node_modules/aws-cdk/lib/index.js:383:129083)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async CdkToolkit.deploy (/Users/***/node_modules/aws-cdk/lib/index.js:383:147507)
    at async exec4 (/Users/***/node_modules/aws-cdk/lib/index.js:438:51799)

Stack Deployments Failed: Error: The stack named MyStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)
khushail commented 1 year ago

thanks @clueleaf .

yuntaoL commented 1 year ago

I have same issue, it worked previously.

prazian commented 2 months ago

IMO, the best thing is to avoid returning nothing in the addPartitionIndex function and instead return the object, so then we could chain dependencies between the two indexes.

Something like this (currently doesn't work because it returns void):

        const table = new S3Table(this, 'Something', {
              .
              .
              .
             });

        const pI1 = table.addPartitionIndex({
                    indexName: 'year_month_day',
                    keyNames: ['year', 'month', 'day']
                });
        const pI2 = table.addPartitionIndex({
                    indexName: 'country_site',
                    keyNames: ['country', 'site']
                });
        pI1.addDependency(pI2); # Does't work because pI1 and pI2 are void