Describe the feature you'd like
Currently, when using Processors such as SKLearnProcessor there is no way to specify where a local code= file should be stored in S3 when used in conjunction with a ProcessingStep. This can lead to clutter in S3 buckets, for example. The current behaviour places code in the default_bucket of a Sagemaker session like so:
A better user experience would be to allow the user to define exactly where the code should be uploaded. This allows users to group files together for each run. For example:
This should already be possible with the FrameworkProcessor and utilising the code_location= parameter but this seems to be ignored by the ProcessingStep.
Describe the feature you'd like Currently, when using Processors such as
SKLearnProcessor
there is no way to specify where a localcode=
file should be stored in S3 when used in conjunction with aProcessingStep
. This can lead to clutter in S3 buckets, for example. The current behaviour places code in thedefault_bucket
of a Sagemaker session like so:s3://{default_bucket}/auto_generated_hash/input/code/preprocess.py
A better user experience would be to allow the user to define exactly where the code should be uploaded. This allows users to group files together for each run. For example:
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/code/preprocess.py
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/data/train.csv
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/model/model.pkl
This should already be possible with the
FrameworkProcessor
and utilising thecode_location=
parameter but this seems to be ignored by theProcessingStep
.