aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.65k stars 3.91k forks source link

(aws-scheduler-targets-alpha): Add SageMakerStartPipelineExecution Target #27457

Closed filletofish closed 8 months ago

filletofish commented 1 year ago

Describe the feature

Work to support L2 constructs for AWS Scheduler is in progress (https://github.com/aws/aws-cdk/issues/23394). See the approved RFC. RFC planned to add 12 templates targets, but only Lambda Invoke is currently implemented (https://github.com/aws/aws-cdk/pull/26575).

This issue tracks implementation of SageMakerStartPipelineExecution target to start Amazon SageMaker pipeline.

Use Case

Customers would like to use templated target SageMakerStartPipelineExecution to be able to start an Amazon SageMaker pipeline on schedule. L2 target construct should grant required permissions to the AWS Scheduler to start an Amazon SageMaker pipeline.

Proposed Solution

The proposed solution needs to be adopted to the recent examples of LambdaInvoke (https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-scheduler-targets-alpha/lib/lambda-invoke.ts).

Solution should also include unit and integration tests.

Class SageMakerStartPipelineExecution should:

  1. Grant Scheduler Execution Role permissions to start SageMaker pipeline via addTargetActionToRole
  2. Override bindBaseTargetConfig to return sageMakerPipelineParameters as part of ScheduleTargetConfig.

Other Information

No response

Acknowledgements

CDK version used

2.99.1

Environment details (OS name and version, etc.)

MacOS

pahud commented 1 year ago

Thank you for all those feature requests and PRs!

github-actions[bot] commented 8 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

lorenzwalthert commented 2 months ago

I am trying to use SageMakerStartPipelineExecution Target to create a EventBridge Rule to trigger a pipeline as described in the README of the Python package aws-cdk.aws-scheduler-targets-alpha:

import aws_cdk.aws_sagemaker as sagemaker

# pipeline: sagemaker.IPipeline

Schedule(self, "Schedule",
    schedule=ScheduleExpression.rate(Duration.minutes(60)),
    target=targets.SageMakerStartPipelineExecution(pipeline,
        pipeline_parameter_list=[targets.SageMakerPipelineParameter(
            name="parameter-name",
            value="parameter-value"
        )]
    )
)

However, I fail to understand how to create the pipeline object, which must be of class sagemaker.IPipeline according to the comment above. I looked in the docs and figured out how I can construct a Pipeline in CDK with sagemaker.CfnPipeline but that gives me an error since it's the wrong class. The documentation of IPipeline is not informative enough for me to learn how I can construct a IPipeline object. Looking through GitHub PRs and Issues I found out that the PR to support this feature had initially proposed to use CfnPipeline, but then, you decided to use the (in my understanding) placeholder sagemaker.IPipeline for now.

Can you clarify if the example code from the README is supposed to work with the current release of CDK? If yes, how can I create an instance of IPipeline? If not, it would be great if the official CDK distribution could support creation of the required construct. Otherwise, it seem like having a SageMakerStartPipelineExecution is of little value if it can't actually be used to schedule a pipeline execution. I also could not figure out how to use any combination of alpha and non-alpha CDK Python packages to achieve my goal (scheduling a Pipeline Execution with supplying parameters).

Side note: The sagemaker python sdk has support to attach EventBridge schedules to SageMaker Pipeline, however it can't take any parameter (so only pipeline executions with default parameters are possible), plus, it's not compatible with Local Mode, and the schedule does not make its way into the Pipeline definition, and creates the rules ad-hoc (before upsetting the pipeline), so if I create the pipeline definition with the sagemaker python sdk and then use this definition in cdk / cfn, I loose the schedule. Hence, there seems to be no way to schedule Pipeline Executions with non-default parameters with CDK or the sagemaker sdk.

I'd appreciate your help (maybe this needs a new issue?) and allow myself to tag the relevant people as described in the comment visibility warning @kaizencc, @pahud, @filletofish.