aws / aws-step-functions-data-science-sdk-python

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS
Apache License 2.0
289 stars 88 forks source link

Feature Request: Support placeholders for Map state #101

Open mochacafe opened 4 years ago

mochacafe commented 4 years ago

Currently I am not able to pass execution input to a Map state. If I try to pass execution input to map state it throws 'Object of type 'ExecutionInput' is not JSON serializable' . So is it possible to add palceholder support to Map state.

input_data_check_map_step= steps.states.Map(state_id='input_data_check_map',
                                           iterator=input_data_check_lambda_task,
                                            input_path=execution_input,
                                           items_path=execution_input['input_data_dirs'],
                                           parameters={'input_dir':'$.Map.Item.Value','fcst_date':"$.fcst_date"},
                                           max_concurrency=5,
                                           comment='Run input data checks in dynamic parllel state',
                                           result_path='$.InputDataCheckRes')
yoodan93 commented 4 years ago

Step Functions recently added support for execution input in InputPath and ItemsPath for Map state! https://docs.aws.amazon.com/step-functions/latest/dg/input-output-contextobject.html

The python SDK still does not support this, but we have plans of keeping the python SDK up to date to current Amazon States Language specs.

wong-a commented 3 years ago

"Support placeholders for Map state" is a broad topic - let's clarify what this means and enumerate the specific use cases and feature gaps.

In @mochacafe's code example there are two places where placeholders are not trying to be used but not supported:

1. InputPath should accept Placeholders (and also OutputPath and ResultPath)

input_path=execution_input

The generated ASL should be the following.

"InputPath": "$$.Execution.Input"

Note that unlike Placeholders in Parameters (the only place they are supported today) the original key should not be suffixed by ".$". All the "Path" ASL fields are just JSONPath strings.

This allows you to specify InputPath without writing a literal JSONPath string by hand. Today you can use the context object (or any JSONPath) in input_path but you need to pass a JSONPath string (e.g. input_path='$$.Execution.Input'') like how result_path is set in the example. The Placeholder class is just a pythonic way to construct the dynamic references without writing JSONPath strings by hand (plus some optional static validation if schemas are used).

This applies to all states that support the InputPath field, not just Map states. OutputPath and ResultPath can have the same functionality too. One caveat is that ResultPath shouldn't accept ExecutionInput, since the context object is non-writable.

2. ItemsPath should accept Placeholders

Similar to above, make it easier to construct the JSONPath using Placeholders. The workaround today is to just write the string:

items_path='$$.Execution.Input.input_data_dirs'

Only Map state has this field.

3. Add Placeholders for Map Index and Map Value

This one is not actually called out in @mochacafe 's example but would be nice to have too.

For Map states, the Context Object has special fields to reference the iteration's Index and Value. See: https://docs.aws.amazon.com/step-functions/latest/dg/input-output-contextobject.html#contextobject-map

Today you can use them in a Map state's Parameters by writing the strings like so:

parameters={
  'field_in_iterator_value': '$$.Map.Item.Value["myField"]',
  'index':'$$.Map.Item.Index'', 
  'static_value': [1, 2, 3],
  ...
},

This generates the following ASL:

"Parameters": {
  "field_in_iterator_value.$":  "$$.Map.Item.Value['myField']",
  "index.$": "$$.Map.Item.Index",
  "static_value": [1, 2, 3]
}

Supporting placeholders means you can use a Placeholder class to build the dynamic reference without writing JSONPath. (Borrowing the proposed class names from https://github.com/aws/aws-step-functions-data-science-sdk-python/pull/158:

parameters={
  'field_in_iterator_value': MapItemValue()['myField'],
  'index':MapItemIndex(), 
  'static_value': [1, 2, 3],
  ...
},