aws / aws-step-functions-data-science-sdk-python

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS
Apache License 2.0
285 stars 87 forks source link

feat: Support placeholders for input_path and output_path for all States (except Fail) and items_path for MapState #158

Open ca-nguyen opened 3 years ago

ca-nguyen commented 3 years ago

Description

With this change, it will be possible to use:

  1. Placeholders for input_path and output_path for all States (except Fail)
  2. Placeholders for items_path for Map State
  3. Context Object Data for Map states

Fixes #101

Why is the change necessary?

This enables the capacity to define input_path and output_path values dynamically for all States (except Fail State). This also supports using placeholder for items_path and context object for MapState.

Solution

Support Placeholders for input_path, output_path and items_path

During workflow definition serialization, replace placeholder with json path when the parsed argument is one of the three (input_path, output_pat, items_path).

Support Context Object Data for Map State

Add new Placeholder objects MapItemValue and MapItemIndex with a json string template to use during workflow definition serialization.

Placeholder Gets replaced by json string template
MapItemValue Value of the array item that is being processed in the current iteration $$.Map.Item.Value{}
MapItemIndex Index number of the array item that is being processed in the current iteration $$.Map.Item.Index
Example
map_item_value = MapItemValue(schema={
        'name': str,
        'age': str
    })

map_item_index = MapItemIndex()

map_state = Map(
    'MapState01',
    parameters={
        "MapIndex": map_item_index,
        "Name": map_item_value['name'],
        "Age": map_item_value['age']
    }
)
iterator_state = Pass(
    'TrainIterator'
)

map_state.attach_iterator(iterator_state)
workflow_definition = Chain([map_state])

workflow = Workflow(
    name="MapItemExample",
    definition=workflow_definition,
    role=workflow_execution_role
)

Workflow definition will be:

{
    "StartAt": "MapState01",
    "States": {
        "MapState01": {
            "Parameters": {
                "MapIndex.$": "$$.Map.Item.Index",
                "Name.$": "$$.Map.Item.Value['name']",
                "Age.$": "$$.Map.Item.Value['age']"
            },
            "Type": "Map",
            "End": true,
            "Iterator": {
                "StartAt": "TrainIterator",
                "States": {
                    "TrainIterator": {
                        "Type": "Pass",
                        "End": true
                    }
                }
            }
        }
    }
}

Testing


Pull Request Checklist

Please check all boxes (including N/A items)

Testing

Documentation

Title and description


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license.

wong-a commented 3 years ago

Clarified what "support placeholders for Map state" means in the initial issue comments: https://github.com/aws/aws-step-functions-data-science-sdk-python/issues/101#issuecomment-917040833. So far, this PR only addresses the 3rd use case, which isn't actually what the requester was trying to do.

ca-nguyen commented 3 years ago

I think we should also update the docs related to placeholders. Might make sense to re-purpose the example in your commit body to illustrate its usage.

Agreed - will update the placeholder docs

does it make sense to add an integration test that exercises placeholders? as we expand the use cases and scenarios we support, I'm thinking it would be useful to have some basic integ tests as unit tests are more brittle.

Yes it does - will include one for Map state

wong-a commented 3 years ago

@ca-nguyen This change doesn't just affect Map state. There's the 3 things I mentioned here: https://github.com/aws/aws-step-functions-data-science-sdk-python/issues/101#issuecomment-917040833

InputPath and OutputPath are allowed in all state types except Fail. Please update the PR title, description, and tests accordingly.

ca-nguyen commented 3 years ago

InputPath and OutputPath are allowed in all state types except Fail. Please update the PR title, description, and tests accordingly.

Updated the PR title and description

StepFunctions-Bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository