aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.65k stars 3.91k forks source link

(stepfunctions-tasks): relax inputDataConfig: Channel[] to any[] #22582

Open pfmeng opened 2 years ago

pfmeng commented 2 years ago

Describe the feature

We have a use case where a json payload with Sagemaker training job definitions are passed in to APIGateway -> StepFunction -> Sagemaker. The architecture can serve different types of models with varying number of inputs. The json payload is not explicitly available at package build time but only much later when trigger APIGateway. With "inputDataConfig: Channel[]", s3DataSource is a required field of Channel object, it's impossible to assemble list of Channel objects since the json is not available and can't be accessed. Relaxing it to any[] would solve the problem as we can just pass around the whole list without the need to access into each item. Could you please help with this? Thanks

Use Case

Our model hosting infrastructure should support all types of training models in our team with variable number of inputs.

Proposed Solution

Change inputDataConfig: Channel[] to inputDataConfig: any[] https://github.com/aws/aws-cdk/blob/74318c7d22bfc00de9e005f68a0a6aaa58c7db39/packages/%40aws-cdk/aws-stepfunctions-tasks/lib/sagemaker/create-training-job.ts L123

Other Information

No response

Acknowledgements

CDK version used

CDKv2, CDKBuild-4.x

Environment details (OS name and version, etc.)

Linux

madeline-k commented 1 year ago

Hi @pfmeng, I don't fully understand your use case. Can you please provide a minimal code example defining a stack including the constructs you are talking about?

At first glance, I don't think we will be able to change this type to any. It might be possible to make it an optional property, if this construct can be configured without it. I'll need to look into this construct a bit more to know for sure.

github-actions[bot] commented 1 year ago

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

pfmeng commented 1 year ago

Putting it in another way: this line of code required that a list of Channel object must be provided to inputDataConfig in order to define a SageMakerCreateTrainingJob. However we can't define a list of Channel object in the CDK construct package, because the input is not available at build time. Also the input (containing list of channel in json string) can have variable number of channels. If the type check is relaxed, changing inputDataConfig: Channel[] to inputDataConfig: any[], the problem can be solved. If still unclear, please ping me again. Thanks!

kaizencc commented 1 year ago

Hi @pfmeng, I think I understand your problem and use case. The input you wish to provide to inputDataConfig comes from a different step in the stepfunctions state machine, so it's not ready as a Channel[]. I think the way we've solved this in the past is to type it as a sfn.TaskInput. Then you would supply the "list of channel in json string" as sfn.TaskInput.fromJsonPathAt() or one of the other APIs.

Does that sound like it would solve what you're asking? Whether or not we can safely add that in as not a breaking change is a different thing; I need to do a little research on how we've fixed things like this in the past.

pfmeng commented 1 year ago

It does sound like the solution. I'll try it and please also share any code samples if available. Thanks!

straygar commented 1 year ago

@pfmeng I've recently had to do the same! (handle dynamic input channels). Do you have any code snippets you can share?

pfmeng commented 1 year ago

@kaizencc So I tried inputDataConfig = sfn.TaskInput.fromJsonPathAt('$.input') which didn't work as inputDataConfig should be of channel [] type. s3DataSource needs to be explicitly defined since it's a required field of Channel. However it's not possible as during build time, only the input field in json payload exists for sure, underneath it is a list of Channel objects of unknown length. Can you give some code examples please? Thanks!