aws / aws-step-functions-data-science-sdk-python

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS
Apache License 2.0
285 stars 87 forks source link

refactor: Update AWS Glue Databrew StartJobRun step to use integration pattern input #176

Closed ca-nguyen closed 2 years ago

ca-nguyen commented 2 years ago

Description

Update AWS Glue Databrew service integration to use integration_pattern input instead or wait_for_completion flag.

Fixes #(issue) (N/A)

Why is the change necessary?

This change is necessary for consistency with the new service integration implementation pattern introduced in commit (Add support for Nested Step Functions) that uses the integration_pattern arg in the step constructor to build the resource.

Support for AWS Glue Databrew service integration was added in this commit, but not released yet. A later commit (Add support for Nested Step Functions) introduced a new implementation pattern using the IntegrationPattern enum as input to construct the step instead of the wait_for_completion flag. (See PR for more detail on rationale behind the implementation).

Solution

Replace the wait_for_completion flag with integration_pattern arg in StartJobRun step construction.

The IntegrationPattern is used to build the Resource arn as follow: IntegrationPattern Resource Doc
WaitForCompletion "arn:aws:states:::states:databrew:startJobRun.sync" Run A job
CallAndContinue "arn:aws:states:::states:databrew:startJobRun" Request Response

See Service Integration Patterns for more details

Normally, replacing a constructor argument would be a breaking change, but since we have not released support for AWS Glue Databrew service integration yet, it is acceptable to do so. After next release, it making such changes will be considered as not being backward compatible.

Testing

workflow.create() workflow.execute()

   - Call And Continue: 

Create a workflow with the following steps & execute

step_call_and_continue = GlueDataBrewStartJobRunStep( 'Start Glue DataBrew Job Run - CallAndContinue', integration_pattern=IntegrationPattern.CallAndContinue, parameters={ "Name": } )

step_default = GlueDataBrewStartJobRunStep('Start Glue DataBrew Job Run - Default', parameters={ "Name": } )

Validate that the workflows end after starting the job

   - Wait For Completion

Create a workflow with the following step & execute

step_wait_for_completion = GlueDataBrewStartJobRunStep( 'Start Glue DataBrew Job Run - WaitForCompletion', integration_pattern=IntegrationPattern.WaitForCompletion, parameters={ "Name": } )

Validate that the workflow only ends after the job completes



----

### Pull Request Checklist

Please check all boxes (including N/A items)

#### Testing

- [X] Unit tests added
- [X] Integration test added - **N/A**
- [X] Manual testing - why was it necessary? could it be automated?  **No integ tests added as external resources were required for testing** 

#### Documentation

- [X] __docs__: All relevant [docs](https://github.com/aws/aws-step-functions-data-science-sdk-python/tree/main/doc) updated
- [X] __docstrings__: All public APIs documented

### Title and description

- [X] __Change type__: Title is prefixed with change type: and follows [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/)
- [X] __References__: Indicate issues fixed via: `Fixes #xxx` - **N/A**

----

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license.
StepFunctions-Bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository