aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.55k stars 3.87k forks source link

(custom-resources): not sorting filter rules on put bucket notification configuration causes error #31413

Closed uncledru closed 2 weeks ago

uncledru commented 2 weeks ago

Describe the bug

When calling addEventNotification on an S3 Bucket

systemBucket.addEventNotification(...)

We get a failure for the S3 API indicating overlapping rules:

An error occurred (InvalidArgument) when calling the PutBucketNotificationConfiguration operation: Configuration is ambiguously defined. Cannot have overlapping suffixes in two rules if the prefixes are overlapping for the same event type   After debugging my specific scenario - it turns out the order of the FilterRules passed as input to the custom resource vs what is returned by the S3.Client.get_bucket_notification_configuration don't always match.

Regression Issue

Last Known Working CDK Version

No response

Expected Behavior

I expect to generate the same ID regardless of the ordering of the FilterRules array returned in the S3 API response and successfully put the notification configuration when these things are out of order.

Current Behavior

An error occurred (InvalidArgument) when calling the PutBucketNotificationConfiguration operation: Configuration is ambiguously defined. Cannot have overlapping suffixes in two rules if the prefixes are overlapping for the same event type

Reproduction Steps

Since I'm short on time I am not going to reproduce an entire CDK app that exhibits this problem since it also involves some existing state. A quick way to test this in isolation is:

import json

stack_id = "arn:aws:cloudformation:us-east-1:xxx:stack/Stack/996657a0-6869-11ee-90da-1202312cd4f9"

def get_id(n):
    n["Id"] = ""
    strToHash = (
        json.dumps(n, sort_keys=True)
        .replace('"Name": "prefix"', '"Name": "Prefix"')
        .replace('"Name": "suffix"', '"Name": "Suffix"')
    )
    # print(strToHash)
    return f"{stack_id}-{hash(strToHash)}"

def test_get_id():
    a = {
        "Events": ["s3:ObjectCreated:Put"],
        "Filter": {
            "Key": {
                "FilterRules": [
                    {"Value": "results.json", "Name": "suffix"},
                    {
                        "Value": "dummyValue",
                        "Name": "prefix",
                    },
                ]
            }
        },
        "LambdaFunctionArn": "arn:aws:lambda:us-east-1:xxx:function:Stack-Function-zo9bNZp874wc:latest",
    }

    b = {
        "Id": "arn:aws:cloudformation:us-east-1:xxx:stack/Stack/996657a0-6869-11ee-90da-1202312cd4f9-7145044700262996772",
        "LambdaFunctionArn": "arn:aws:lambda:us-east-1:xxx:function:Stack-Function-zo9bNZp874wc:latest",
        "Events": ["s3:ObjectCreated:Put"],
        "Filter": {
            "Key": {
                "FilterRules": [
                    {
                        "Name": "Prefix",
                        "Value": "dummyValue",
                    },
                    {"Name": "Suffix", "Value": "results.json"},
                ]
            }
        },
    }

    return get_id(a) == get_id(b)

print(test_get_id()) # False

Possible Solution

Introduce a function recursive_sort() that will sort all dicts/objects in the response. In the event the response ordering does change this will ensure the hash generated will remain the same by recursively sorting all levels.

import json

stack_id = "arn:aws:cloudformation:us-east-1:xxx:stack/Stack/996657a0-6869-11ee-90da-1202312cd4f9"

def recursive_sort(obj):
    """
    Recursively sort dictionaries by keys and lists by their elements.
    Lists of dictionaries are sorted by a unique key (like "Name" or "Value").
    """
    if isinstance(obj, dict):
        # Sort the dictionary by keys and recursively sort the values
        return {k: recursive_sort(v) for k, v in sorted(obj.items())}
    elif isinstance(obj, list):
        # Sort the list elements; if elements are dicts, sort them by a key or their string representation
        return sorted(
            [recursive_sort(i) for i in obj],
            key=lambda x: json.dumps(x, sort_keys=True),
        )
    else:
        # Return the object if it's neither a list nor a dictionary
        return obj

def get_id(n):
    n["Id"] = ""
    strToHash = (
        json.dumps(recursive_sort(n), sort_keys=True)
        .replace('"Name": "prefix"', '"Name": "Prefix"')
        .replace('"Name": "suffix"', '"Name": "Suffix"')
    )
    # print(strToHash)
    return f"{stack_id}-{hash(strToHash)}"

def test_get_id():
    a = {
        "Events": ["s3:ObjectCreated:Put"],
        "Filter": {
            "Key": {
                "FilterRules": [
                    {"Value": "results.json", "Name": "suffix"},
                    {
                        "Value": "dummyValue",
                        "Name": "prefix",
                    },
                ]
            }
        },
        "LambdaFunctionArn": "arn:aws:lambda:us-east-1:xxx:function:Stack-Function-zo9bNZp874wc:latest",
    }

    b = {
        "Id": "arn:aws:cloudformation:us-east-1:xxx:stack/Stack/996657a0-6869-11ee-90da-1202312cd4f9-7145044700262996772",
        "LambdaFunctionArn": "arn:aws:lambda:us-east-1:xxx:function:Stack-Function-zo9bNZp874wc:latest",
        "Events": ["s3:ObjectCreated:Put"],
        "Filter": {
            "Key": {
                "FilterRules": [
                    {
                        "Name": "Prefix",
                        "Value": "dummyValue",
                    },
                    {"Name": "Suffix", "Value": "results.json"},
                ]
            }
        },
    }

    return get_id(a) == get_id(b)

print(test_get_id()) # True

Additional Information/Context

I'm not sure why/how the order doesn't match what's returned in the API. We haven't touched this code in quiet some time so something at some layer (s3 api, boto3, etc) changed that broke this.

CDK CLI Version

2.115.0 (build 58027ee)

Framework Version

No response

Node.js Version

v18.19.0

OS

macos

Language

TypeScript, Python

Language Version

TypesScript (5.5.4) | Python (3.12.3)

Other information

No response

ashishdhingra commented 2 weeks ago

@uncledru Good afternoon. Could you please verify if this is duplicate of https://github.com/aws/aws-cdk/issues/31303?

Thanks, Ashish

uncledru commented 2 weeks ago

@uncledru Good afternoon. Could you please verify if this is duplicate of https://github.com/aws/aws-cdk/issues/31303?

Thanks, Ashish

Yes it is. I searched before I created this issue. Not sure how I missed that. Sorry!

ashishdhingra commented 2 weeks ago

@uncledru Good afternoon. Could you please verify if this is duplicate of #31303? Thanks, Ashish

Yes it is. I searched before I created this issue. Not sure how I missed that. Sorry!

@uncledru No problem. Closing this one in favor of https://github.com/aws/aws-cdk/issues/31303 since the other one is already prioritized and assigned.

github-actions[bot] commented 2 weeks ago

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.