aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.33k stars 3.76k forks source link

(@aws-cdk/aws-glue-alpha): CDK + partition indices + catalog encryption fails deployment #30364

Open ksco92 opened 4 weeks ago

ksco92 commented 4 weeks ago

Describe the bug

If you make your catalog settings using this and add a KMS key:

https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_glue.CfnDataCatalogEncryptionSettings.html

And then make an S3 table with partition indices using this:

https://docs.aws.amazon.com/cdk/api/v2/docs/@aws-cdk_aws-glue-alpha.S3Table.html

The creation of the partition indices will fail. This is from a CW log of the custom resource:

2024-05-28T15:05:10.698Z    5ad1162d-2cc6-4c42-8ea7-699c8a81c0fa    INFO    {
    "RequestType": "Create",
    "ServiceToken": "redacted",
    "ResponseURL": "...",
    "StackId": "redacted",
    "RequestId": "5d573d5c-a3a7-4998-818a-2a616d249e3c",
    "LogicalResourceId": "redacted",
    "ResourceType": "Custom::AWS",
    "ResourceProperties": {
        "ServiceToken": "redacted",
        "InstallLatestAwsSdk": "false",
        "Create": {
            "service": "Glue",
            "action": "createPartitionIndex",
            "parameters": {
                "DatabaseName": "redacted",
                "TableName": "redacted",
                "PartitionIndex": {
                    "IndexName": "run_date_index",
                    "Keys": [
                        "run_date"
                    ]
                }
            },
            "physicalResourceId": {
                "id": "run_date_index"
            }
        }
    }
}

And then:

2024-05-28T15:05:25.729Z    5ad1162d-2cc6-4c42-8ea7-699c8a81c0fa    INFO    InternalFailure: UnknownError
    at throwDefaultError (/var/runtime/node_modules/@aws-sdk/node_modules/@smithy/smithy-client/dist-cjs/index.js:838:20)
    at /var/runtime/node_modules/@aws-sdk/node_modules/@smithy/smithy-client/dist-cjs/index.js:847:5
    at de_CommandError (/var/runtime/node_modules/@aws-sdk/client-glue/dist-cjs/index.js:6042:14)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /var/runtime/node_modules/@aws-sdk/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20
    at async /var/runtime/node_modules/@aws-sdk/node_modules/@smithy/core/dist-cjs/index.js:165:18
    at async /var/runtime/node_modules/@aws-sdk/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
    at async /var/runtime/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:33:22
    at async D.invoke (/var/task/index.js:1:119684)
    at async Object.qe (/var/task/index.js:1:126630) {
  '$fault': 'client',
  '$metadata': {
    httpStatusCode: 500,
    requestId: '6599dc99-d6ff-4495-9c12-8cbec9fad351',
    extendedRequestId: undefined,
    cfId: undefined,
    attempts: 3,
    totalRetryDelay: 174
  },
  __type: 'InternalFailure'
}

If I remove the KMS key from the catalog settings, this succeeds.

Expected Behavior

If the catalog is encrypted, the custom resource should be given permissions to the catalog KMS key.

Current Behavior

The deployment fails because the role that runs the lambda function has no permissions to encrypt/decrypt the catalog key.

Reproduction Steps

In a single stack:

Possible Solution

Allow to pass a KMS key to S3Table to indicate that the catalog is encrypted and that the custom resource should have access to it.

Additional Information/Context

No response

CDK CLI Version

2.143.0

Framework Version

No response

Node.js Version

18

OS

Mac

Language

TypeScript

Language Version

No response

Other information

No response

ashishdhingra commented 3 weeks ago

@ksco92 Good morning. Could you please share the minimal CDK code to reproduce the issue, including CfnDataCatalogEncryptionSettings? I tried reproducing the issue at my end, somehow unable to reproduce the issue using the below code:

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as kms from 'aws-cdk-lib/aws-kms';
import { aws_glue as glue } from 'aws-cdk-lib';
import * as glue_alpha from '@aws-cdk/aws-glue-alpha';

export class Issue30364GluekmsStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const s3bucket = new s3.Bucket(this, 'test-glue-s3-bucket');
    const kmsKey = new kms.Key(this, 'test-glue-kms-key', {
      alias: 'test-glue-kms-key'
    });

    const cfnDataCatalogEncryptionSettings = new glue.CfnDataCatalogEncryptionSettings(this, 'MyCfnDataCatalogEncryptionSettings', {
      catalogId: this.account,
      dataCatalogEncryptionSettings: {
        connectionPasswordEncryption: {
          kmsKeyId: kmsKey.keyId,
          returnConnectionPasswordEncrypted: true,
        }
      },
    });

    const glueDatabase = new glue_alpha.Database(this, 'test-glue-db', {
      databaseName: 'test-glue-db',
      description: 'Test Glue DB' 
    });

    const glueS3Table = new glue_alpha.S3Table(this, 'test-glue-s3-table', {
      database: glueDatabase,
      columns: [{
        name: 'col1',
        type: glue_alpha.Schema.STRING,
      }],
      partitionKeys: [{
        name: 'year',
        type: glue_alpha.Schema.SMALL_INT,
      }, {
        name: 'month',
        type: glue_alpha.Schema.SMALL_INT,
      }],
      partitionIndexes: [
        {
          keyNames: ['year'],
          indexName: 'yearindex'
        }
      ],
      dataFormat: glue_alpha.DataFormat.JSON,
      bucket: s3bucket,
      enablePartitionFiltering: true,
    });
  }
}

Thanks, Ashish

github-actions[bot] commented 3 weeks ago

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

ksco92 commented 3 weeks ago

This reproduces the error:

https://github.com/ksco92/partitions_bug

In your example you are using connectionPasswordEncryption rather than encryptionAtRest. The former encrypts connection objects to external data sources, the later encrypts the metadata of the catalog at rest.

ashishdhingra commented 1 week ago

Reproducible using below code:

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as kms from 'aws-cdk-lib/aws-kms';
import { aws_glue as glue } from 'aws-cdk-lib';
import * as glue_alpha from '@aws-cdk/aws-glue-alpha';

export class Issue30364GluekmsStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const s3bucket = new s3.Bucket(this, 'test-glue-s3-bucket');
    const kmsKey = new kms.Key(this, 'test-glue-kms-key', {
      alias: 'test-glue-kms-key',
      enableKeyRotation: true,
      removalPolicy: cdk.RemovalPolicy.DESTROY
    });

    const cfnDataCatalogEncryptionSettings = new glue.CfnDataCatalogEncryptionSettings(this, 'MyCfnDataCatalogEncryptionSettings', {
      catalogId: this.account,
      dataCatalogEncryptionSettings: {
        encryptionAtRest: {
          catalogEncryptionMode: 'SSE-KMS',
          sseAwsKmsKeyId: kmsKey.keyId
        }
      }
    });

    const glueDatabase = new glue_alpha.Database(this, 'test-glue-db', {
      databaseName: 'test-glue-db',
      description: 'Test Glue DB' 
    });

    const glueS3Table = new glue_alpha.S3Table(this, 'test-glue-s3-table', {
      database: glueDatabase,
      columns: [{
        name: 'col1',
        type: glue_alpha.Schema.STRING,
      }],
      partitionKeys: [{
        name: 'year',
        type: glue_alpha.Schema.SMALL_INT,
      }, {
        name: 'month',
        type: glue_alpha.Schema.SMALL_INT,
      }],
      partitionIndexes: [
        {
          keyNames: ['year'],
          indexName: 'yearindex'
        }
      ],
      dataFormat: glue_alpha.DataFormat.JSON,
      bucket: s3bucket,
      enablePartitionFiltering: true,
    });
  }
}

CloudFormation fails with the error Received response status [FAILED] from custom resource. Message returned: UnknownError (RequestId: e67634ce-1bab-46b8-9ed0-75c76f237c55) when creating custom resource that creates partition indices.