Open admivsn opened 1 month ago
This is well-thought out! My thoughts are: output_kms_key
being applied to the input code doesn't quite make sense from a naming perspective.
And in the estimator case, it's only when the buckets are the same (output and code) that it's applied.
kms_key = self.output_kms_key if code_bucket == output_bucket else None
Probably because it's generally safe to assume if the bucket is the same and the output has some encryption, the same encryption can be applied to the code, whereas not applying it to the code might not work for uploading due to bucket policies. Maybe this logic was added as a fix to get around some common user errors caused by bucket encryption policies.
Actually it might make sense for the estimator to encrypt the code with a kms_key
that is not the output_kms_key
.
Describe the bug The
kms_key
used to encrypt either the user code file or local inputs when uploading to S3 should default tooutput_kms_key
.This would align the behaviour of with
sagemaker.estimator.Estimator
whereoutput_kms_key
is used to encrypt the tar'd user training code when uploading to S3.Also, since
output_kms_key
is resolved from the config it means thatkms_key
can inherit this default from the config.To reproduce A clear, step-by-step set of instructions to reproduce the bug. The provided code need to be complete and runnable, if additional data is needed, please include them in the issue.
Expected behavior The
kms_key
should default tooutput_kms_key
. This can be implemented in either:sagemaker.processing.Processor._normalize_args
sagemaker.processing.Processor._normalize_inputs
Screenshots or logs If applicable, add screenshots or logs to help explain your problem.
System information A description of your system. Please provide:
Additional context Add any other context about the problem here.