aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.58k stars 3.88k forks source link

aws-stepfunctions-tasks: EmrContainersStartJobRun could not use Role from SF input #21319

Open SerjChaoz opened 2 years ago

SerjChaoz commented 2 years ago

Describe the feature

EmrContainersStartJobRunProps support readonly executionRole?: iam.IRole; I was wondering if EmrContainersStartJobRunProps can accept optional executionRoleArn and use it if executionRole not provided. The reason for this request is a limitation in Role.fromRoleArn()

In the Role documentation for fromRoleArn() https://docs.aws.amazon.com/cdk/api/v1/docs/@aws-cdk_aws-iam.Role.html#static-fromwbrrolewbrnamescope-id-rolename

If the imported Role ARN is a Token (such as a CfnParameter.valueAsString or a Fn.importValue()) 
and the referenced role has a path (like arn:...:role/AdminRoles/Alice), 
the roleName property will not resolve to the correct value. Instead it will resolve to the first path component. 
We unfortunately cannot express the correct calculation of the full path name as a CloudFormation expression. 
In this scenario the Role ARN should be supplied without the path in order to resolve the correct role resource.

I tried different options but all of them failed during build/deployment. I was able to make it work using CustomState. Without this feature it is not possible to use EmrContainersStartJobRun with roles extracted from StepFunction input.

Use Case

This is needed because "cdk deploy" throws an error when Role must be picked from SF input:

executionRole: Role.fromRoleArn(this, "EmrExecutionRole", JsonPath.stringAt("$.executionRoleArn")),

Code snippet and error message provided under "Other Information"

Proposed Solution

In order to handle Arn from JSON Input I suggest to add extra property to EmrContainersStartJobRunProps:

export interface EmrContainersStartJobRunProps extends sfn.TaskStateBaseProps {
    ...
    /**
     * The execution role for the job run.
     *
     * If `virtualClusterId` is from a JSON input path, an execution role must be provided.
     * If an execution role is provided, follow the documentation to update the role trust policy.
     * @see https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-trust-policy.html
     *
     * @default - Automatically generated only when the provided `virtualClusterId` is not an encoded JSON path
     */
    readonly executionRole?: iam.IRole;
    /**
     * The execution role arn for the job run.
     *
     * Used when both `virtualClusterId` and `executionRoleArn` needs to be taken from a JSON input path.
     * Conflicts with `executionRole` when both provided.
     */
    readonly executionRoleArn?: String;
    ...

Other Information

Code:

export class SparkSubmitJobDriverClass implements SparkSubmitJobDriver {
    entryPoint: TaskInput;
    entryPointArguments: TaskInput;
    sparkSubmitParameters?: string;

    constructor(entryPoint: TaskInput, entryPointArguments: TaskInput, sparkSubmitParameters: string) {
        this.entryPoint = entryPoint;
        this.entryPointArguments = entryPointArguments;
        this.sparkSubmitParameters = sparkSubmitParameters;
    }
}

export class JobDriverClass implements JobDriver {
    sparkSubmitJobDriver: SparkSubmitJobDriver;

    constructor(sparkDriver: SparkSubmitJobDriver) {
        this.sparkSubmitJobDriver = sparkDriver;
    }
}

export class EmrMonitoring implements Monitoring {
    logGroup: ILogGroup;
    logStreamNamePrefix: string;
    persistentAppUI: boolean;

    constructor(logGroup: ILogGroup, logStreamNamePrefix: string, persistentAppUI: boolean) {
        this.logGroup = logGroup;
        this.logStreamNamePrefix = logStreamNamePrefix;
        this.persistentAppUI = persistentAppUI;
    }
}

const startJobRunSync = new EmrContainersStartJobRun(this, "StartJobRunSync", {
    virtualCluster:  VirtualClusterInput.fromVirtualClusterId(JsonPath.stringAt(`$.virtualClusterId`)),
    jobName: JsonPath.stringAt(`$.name`),
    // JsonPath returns a $Token which should be picked up from execution input
    executionRole: Role.fromRoleArn(this, "EmrExecutionRole", JsonPath.stringAt("$.executionRoleArn")),
    // hardcoded role works
    // executionRole: Role.fromRoleArn(this, "EmrExecutionRole", "arn:aws:iam::0123:role/SomeExistingRole"),
    releaseLabel: new ReleaseLabel(JsonPath.stringAt(`$.releaseLabel`)),
    jobDriver: new JobDriverClass(new SparkSubmitJobDriverClass(
        TaskInput.fromJsonPathAt(`$.entryPoint`),
        TaskInput.fromObject(JsonPath.listAt(`$.entryPointArgs`)),
        JsonPath.stringAt("$.sparkSubmitParameters"))),
    monitoring: new EmrMonitoring(
        LogGroup.fromLogGroupName(this, "StartJobRunLogGroup", JsonPath.stringAt(`$.logGroupName`)),
        JsonPath.stringAt(`$.logStreamNamePrefix`),
        true)
});

This throws an error during "cdk deploy":

Stack failed: Error [ValidationError]: Template error: Fn::Select  cannot select nonexistent value at index 4

Acknowledgements

CDK version used

2.14.0

Environment details (OS name and version, etc.)

macOS Big Sur 11.6.6

msambol commented 1 year ago

@peterwoodworth Can you assign this to me?

peterwoodworth commented 1 year ago

Got it 👍🏻 Thanks for contributing!