getindata / kedro-sagemaker

Kedro Plugin to support running pipelines on AWS SageMaker.
https://kedro-sagemaker.readthedocs.io
Apache License 2.0
18 stars 6 forks source link

Deploying a model after training with kedro-sagemaker #15

Closed Riezebos closed 1 year ago

Riezebos commented 1 year ago

Hi,

Thanks for this cool project! I would like to integrate Kedro with Sagemaker. If I understand things correctly, using kedro-sagemaker I can run a Kedro Pipeline in Sagemaker Pipelines. This would result in a trained model that I can deploy as a Sagemaker Endpoint for inference.

When the deployed model receives a request, the data still needs to be transformed (e.g. scaling, one-hot encoding, ...). Is there a way to run the Kedro Pipeline as part of the Sagemaker Endpoint using transformations that are fitted during training?

marrrcin commented 1 year ago

Depending on the tech stack that you use, you can package your pre-processing steps inside of the of a custom model e.g. with:

and have only one artifact to serve.

Alternatively, you can use native SageMaker capabilities with the libraries supported by SageMaker (https://aws.amazon.com/blogs/machine-learning/preprocess-input-data-before-making-predictions-using-amazon-sagemaker-inference-pipelines-and-scikit-learn/).

If your Kedro pipeline contains nodes that are pure Python functions, it's easy to re-use them at inference time.

Riezebos commented 1 year ago

Thanks for the quick response, those seem like good options! Do you know whether there are any plugins/integrations between Kedro -> Sagemaker Model Registry -> BentoML?

It seems straightforward to store the processing steps+model in s3 using BentoML inside of a Kedro Node, but it would be nice to use the Sagemaker Model Registry for governance

marrrcin commented 1 year ago

I'm not aware of those, however I haven't searched.