aws / aws-step-functions-data-science-sdk-python

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS
Apache License 2.0
288 stars 88 forks source link

Enhancement: Update the estimator hyperparameters with the placeholder hyperparameters passed to TrainingStep #163

Open ca-nguyen opened 3 years ago

ca-nguyen commented 3 years ago

Currently, it is not possible to update the estimator hyperparameters with the hyperparameters passed to TrainingStep if a Placeholder is used as input. The merging of hyperparameters can only be done if the hyperparameters passed to Training step is a dict.

Proposing to add a utility that translates a Placeholder to a usable jsonpath dict (with $). That translated placeholder_dict could be used as hyperparameters input to TrainingStep. With that, merging the constructor and estimator hyperparameters will be possible.

wong-a commented 3 years ago

The description here is a bit confusing. It's not unique to TrainingStep. The same would apply anywhere there is a merging of properties. Another instance is with parameters in ProcessingStep. There will be others in the future.

Added some clarification on the previous issue with an example showing how it is possible today https://github.com/aws/aws-step-functions-data-science-sdk-python/issues/152#issuecomment-917256084

if the estimator sets hyperparameters={'A': a, 'B': b, 'C': c} and ExecutionInput sets hyperparameters={'B': bb, 'D': dd}

To merge the two you need to do the following:

training_step = TrainingStep(...,
  hyperparmeters={ 
    'B': execution_input['TrainingParameters']['B'],
    'D': execution_input['TrainingParameters']['D']
  }
)

The utility is to expand all properties in execution_input['TrainingParameters']s schema (if there is one) into a dict without having to write all properties by hand.