aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.68k stars 3.93k forks source link

AwsCustomResource: Invalid RoleSessionName when PhysicalResourceId is an Arn #23260

Open pergardebrink opened 1 year ago

pergardebrink commented 1 year ago

Describe the bug

I am trying to use AwsCustomResource to make an SDK call with AwsSdkCall for DataSync LocationS3 (I cannot use the CfnLocationS3 as I have to assume another role due to DataSync, which is another issue). I will try to explain the issue in a short way:

Steps to reproduce:

Expected Behavior

Resource should have be deleted

Current Behavior

The Lambda fails with the following error (from CloudWatch, my accountid redacted):

2022-12-07T09:24:45.424Z    f0e10d09-fb44-4517-8e75-fa88ff160fc0    INFO    Error [CredentialsError]: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1
    at Request.extractError (/tmp/node_modules/aws-sdk/lib/protocol/query.js:50:29)
    at Request.callListeners (/tmp/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/tmp/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/tmp/node_modules/aws-sdk/lib/request.js:686:14)
    at Request.transition (/tmp/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/tmp/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /tmp/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/tmp/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/tmp/node_modules/aws-sdk/lib/request.js:688:12)
    at Request.callListeners (/tmp/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
  code: 'CredentialsError',
  time: 2022-12-07T09:24:45.165Z,
  requestId: '3db8bce7-c03c-42ee-b84b-71e987bfcf3d',
  statusCode: 400,
  retryable: false,
  retryDelay: 67.36482039412164,
  originalError: {
    message: 'Could not load credentials from ChainableTemporaryCredentials',
    code: 'CredentialsError',
    time: 2022-12-07T09:24:45.165Z,
    requestId: '3db8bce7-c03c-42ee-b84b-71e987bfcf3d',
    statusCode: 400,
    retryable: false,
    retryDelay: 67.36482039412164,
    originalError: {
      message: "1 validation error detected: Value '1670405084883-arn:aws:datasync:us-east-1:123456789012:location/l' at 'roleSessionName' failed to satisfy constraint: Member must satisfy regular expression pattern: [\\w+=,.@-]*",
      code: 'ValidationError',
      time: 2022-12-07T09:24:45.164Z,
      requestId: '3db8bce7-c03c-42ee-b84b-71e987bfcf3d',
      statusCode: 400,
      retryable: false,
      retryDelay: 67.36482039412164
    }
  }
}

Reproduction Steps

Sample C# code for the custom call

            {
                ResourceType = "Custom::LocationS3",
                OnCreate = new AwsSdkCall
                {
                    AssumedRoleArn = props.BucketAccessRole.RoleArn,
                    Service = "DataSync",
                    Action = "createLocationS3",
                    Parameters = new Dictionary<string, object>
                    {
                        { "S3BucketArn", props.S3BucketArn },
                        { "S3Config", new Dictionary<string, object> {
                            { "BucketAccessRoleArn", props.BucketAccessRole.RoleArn },
                          }
                        },
                        { "Subdirectory", props.Subdirectory },
                    },
                    PhysicalResourceId = PhysicalResourceId.FromResponse("LocationArn")
                },
                OnDelete = new AwsSdkCall
                {
                    AssumedRoleArn = props.BucketAccessRole.RoleArn,
                    Service = "DataSync",
                    Action = "deleteLocation",
                    Parameters = new Dictionary<string, object>
                    {
                        { "LocationArn", new PhysicalResourceIdReference() },
                    },
                },
                Policy = AwsCustomResourcePolicy.FromSdkCalls(new SdkCallsPolicyOptions
                {
                    Resources = AwsCustomResourcePolicy.ANY_RESOURCE
                })
            });

Possible Solution

RoleSessionName should be filtrered for names that are not allowed in the session name here: https://github.com/aws/aws-cdk/blob/df163ec72c61b521ce6f8d7555872c1342a70745/packages/%40aws-cdk/custom-resources/lib/aws-custom-resource/runtime/index.ts#L169

Additional Information/Context

I am not sure this is the correct way to use PhysicalResourceId in this scenario. I need to get the LocationArn that was returned from the OnCreate call, but don't know how to do it other than with PhysicalResourceId.FromLocation("LocationArn") ?

CDK CLI Version

2.53.0

Framework Version

No response

Node.js Version

16.3.2

OS

Windows 11

Language

.NET

Language Version

.NET 6.0

Other information

No response

pergardebrink commented 1 year ago

Returning ARNs from SDK calls is not that uncommon, so this should probably apply to many more instances than DataSync, but since it only happens when the AssumedRoleArn is passed to the AwsSdkCall it might not be a frequent use-case.

Since this only affects the RoleSessionName, it shouldn't be a breaking change to remove/replace any non-valid characters. Maybe a simple solution like this could be enough (I can provide a pull request if the solution looks ok?).

const timestamp = (new Date()).getTime();
const identifier = physicalResourceId.replace(/[^\w+=,.@-]/g, '-');
const params = {
    RoleArn: call.assumedRoleArn,
    RoleSessionName: `${timestamp}-${identifier}`.substring(0, 64),
};
peterwoodworth commented 1 year ago

I think there are a few misunderstandings surrounding custom resources here

Change the AwsCustomResource so that it will trigger a OnDelete on next deploy

I'm not sure what you mean by this. onDelete should only be running if your custom resource is being deleted

I am not sure this is the correct way to use PhysicalResourceId in this scenario. I need to get the LocationArn that was returned from the OnCreate call

This isn't what PhysicalResourceId is supposed to be used for. It's an identifier, not a way to retrieve application data. See the CloudFormation documentation here for the best description of the purpose of PhysicalResourceId

If you need to make use of the data returned from your custom resource, you should make use of the getResponseField() or getResponseFieldReference() methods. See an example here

I suggest further reading the CloudFormation documentation on custom resources for a more complete understanding 🙂

pergardebrink commented 1 year ago

I think there are a few misunderstandings surrounding custom resources here

Yes, I do realize I get confused sometimes with this for sure!

I'm not sure what you mean by this. onDelete should only be running if your custom resource is being deleted

I realize this sounds confusing and actually my step is not properly described anyways, will update the steps. I should instead have said to remove the AwsCutomResource from the Stack. What I meant was that it should trigger the OnDelete handler (I changed the CDK Id in my case, which is not the same as "changing" in the CloudFormation sense).

This isn't what PhysicalResourceId is supposed to be used for. It's an identifier, not a way to retrieve application data. See the CloudFormation documentation here for the best description of the purpose of PhysicalResourceId If you need to make use of the data returned from your custom resource, you should make use of the getResponseField() or getResponseFieldReference() methods. See an example here I suggest further reading the CloudFormation documentation on custom resources for a more complete understanding 🙂

Yes, but I think I understood it's an identifier that CloudFormation uses to know if the operation (from OnUpdate) indicates if it needs to replace the resource (i.e. delete the old) or not. I wanted to to return the underlying actual AWS resource arn as the PhysicalResourceId with the PhysicalResourceId.FromResponse that was created (LocationArn is the identifier for the DataSync LocationS3).

I need to have this LocationArn in the OnDelete handler later when the resource is to be deleted and thought that this was the way to pass the identifier? If this is not the way to pass the identifier between the OnCreate and OnDelete handlers, then I do not understand how to correctly do this otherwise?

pergardebrink commented 1 year ago

And either way, this bug would be the same regardless if I understand the way to do this, as anyone could for any other reason put a PhysicalResourceId that contains characters that is not allowed in the RoleSessionName (only [\w+=,.@-] is allowed there).

barticus commented 1 year ago

I'm also seeing this when using PhysicalResourceId.of("value"). I am trying to call "enableSecurityHub" in a child account of an Organization.

This call produces no output, so a physical resource ID must be provided.

I've set this to

PhysicalResourceId.of(`enable-securityhub-${accountRef}-${region}`)

where accountRef is a name, like logArchiveAccount and region is an aws region, like ap-southeast-2.

I can see this reflected in my template,

\"physicalResourceId\":{\"id\":\"enable-securityhub-logArchiveAccount-ap-southeast-2\"}

However on running, it fails:

Received response status [FAILED] from custom resource. Message returned: 1 validation error detected: Value '1698971270736-2023/11/03/[$LATEST]1e198b3052ef4f629f9ee7f1c42750' at 'roleSessionName' failed to satisfy constraint: Member must satisfy regular expression pattern: [\w+=,.@-]*

I am confused where the "2023/11/03/[$LATEST]1e198b3052ef4f629f9ee7f1c42750" is coming from, as this is not a physical resource I have assigned.

This doesn't happen with any of the other custom resources I have, but none of those have the assumedRoleArn parameter set.

barticus commented 1 year ago

Ah, well mystery solved for where that resource ID is coming from, I had missed the first error where the call was failing in the CREATE trigger, and so when DELETE is called to cleanup, it has this generated ID.

pergardebrink commented 11 months ago

Hit this again in another scenario, and I am still meaning that this is a bug and not a misunderstanding (maybe I was unclear explaining the case).

This example here shows exactly what I want to do: https://github.com/aws-samples/aws-iam-permissions-guardrails/blob/7e96a3ccb2858f28cd4b4fd5dbb50cba96be66ba/iam_permissions_guardrails/constructs/service_control_policies/scp_policy_resource.py#L46C60-L46C60 when it passes the arn from the oncreate down to onupdate and ondelete (as it's the physicalresourceid it represent.

The only difference to what I am trying to do is that the AwsSdk call have the AssumedRoleArn property set in my case. This causes the bug when the PhysicalResourceId is an Arn it contains invalid characters for the RoleSessionName.

I can create a PR to fix this according to my suggestion here: https://github.com/aws/aws-cdk/issues/23260#issuecomment-1340687084

philljie commented 9 months ago

+1 . Ran into the same issue when trying to delete an cross-account OAM Link

     const createLinkCall: AwsSdkCall = {
      service: 'OAM',
      action: 'createLink',
      parameters: {
        LabelTemplate: '$AccountEmailNoDomain',
        SinkIdentifier: props.sinkArn,
        ResourceTypes: [
          'AWS::CloudWatch::Metric',
          'AWS::Logs::LogGroup',
          'AWS::XRay::Trace',
          'AWS::ApplicationInsights::Application',
        ],
      },
      physicalResourceId: PhysicalResourceId.fromResponse('Arn'),
      region: props.region,
      assumedRoleArn: props.assumedRoleArn,
    };

    const deleteLinkCall: AwsSdkCall = {
      service: 'OAM',
      action: 'deleteLink',
      parameters: {
        Identifier: new PhysicalResourceIdReference(),
      },
      region: props.region,
      assumedRoleArn: props.assumedRoleArn,
    };

It is definitely a bug as @pergardebrink pointed out. The roleSessionName should not make any assumption in the physicialResourceId format. It is very typical that the PhysicalResourceId is an ARN.