aws / aws-step-functions-data-science-sdk-python

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS
Apache License 2.0
285 stars 87 forks source link

Project Maintenance #196

Closed Ce11an closed 1 year ago

Ce11an commented 1 year ago

Is this project no longer being maintained? I've enjoyed using it in the past but is now lacking many of the new features in AWS.

On a side note, does anyone know of alternative libraries that are more up to date?

Thanks 🙏🏼

wong-a commented 1 year ago

Unfortunately we haven't invested time much in this project recently. Which features do you use and what is missing for you?

The closest similar library is AWS CDK which allows creation of state machines in Python, TypeScript, Go, .NET, and Java https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_stepfunctions/README.html

It doesn't have the same integration with Jupyter notebooks or prebuilt pipelines though. But if you just want to use Python to author the workflow, it's worth checking out.

Ce11an commented 1 year ago

Thanks for the information! There are several SageMaker steps that are not in the SDK. We have orchestrated our ML pipelines with this SDK in the past but may switch to the SageMaker SDK as there is more support now.

On a side note, do you know the best way to size the cost between using SageMaker Pipelines as opposed to using Step Functions? My team prefers to write steps/pipelines in python as opposed to json/UI.

wong-a commented 1 year ago

AWS Step Functions now supports calling any AWS API (including SageMaker) using AWS SDK Integrations. I think you could use these today with the base Task step and providing the Parameters for the service integration you want to use, which is all the service integration step classes do in this SDK. Here's an example for SageMaker CreateTrainingJob. It could be much simpler depending on the parameters for the API. https://github.com/aws/aws-step-functions-data-science-sdk-python/blob/main/src/stepfunctions/steps/sagemaker.py#L45-L153

Here are the pricing pages for Step Functions and SageMaker: https://aws.amazon.com/sagemaker/pricing/ https://aws.amazon.com/step-functions/pricing/

With Step Functions, you would only pay for state transitions when you execute a workflow plus the costs of using SageMaker APIs. AFAIK, SageMaker Pipelines themselves do not have any cost, but you still pay for the resources for each step and there is a cost for viewing them in SageMaker Studio.

Ce11an commented 1 year ago

That's great to know, thanks. Do you know if you will have capacity to work on this library in the future? Or are features for this library going to be more community driven? Happy to close this ticket if you are 😄

wong-a commented 1 year ago

Unfortunately, I don't have a concrete plan to share about new features, but we are still maintaining this package. That being said, community contributions are more than welcome :)