aws-samples / smallmatter-package

Sample wrappers to streamline data-science tasks using Python toolkits (such as pandas, matplotlib, etc.) on AWS.
MIT No Attribution
7 stars 9 forks source link

kms issue when the job bucket uses sse #13

Open rareal opened 3 years ago

rareal commented 3 years ago

The FrameworkProcessor failes twice if kms_key is not added to the class.

  1. with S3Uploader.upload_string_as_file_body ; which can be fixed just adding the param from run
  2. _upload_payload ; which fails on the estimator upload. The estimator class will only use kms_key (from output_kms_key) if the code_bucket is the same as the output_bucket -> https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/estimator.py#L2291. So the fix here is to give the estimator an output_bucket with s3_prefix and pass in the kms_key from run to output_kms_key.

Happy to submit a pull request if needed. Thanks for the package, good stuff.

verdimrc commented 3 years ago

Hi @rareal, thank you for spotting this bug.

You're more than welcome to create a PR, preferably to this branch: https://github.com/verdimrc/sagemaker-python-sdk/tree/pr-framework-processor. This branch itself is being tracked by PR #2251 from the upstream sagemaker-python-sdk, and the code is updated much more than what's in this repo.

Do let me know.

rareal commented 3 years ago

Looks like the code in the other branch is considerably different then. I see one issue, which is the function get_run_args, it is defined in the init but it needs to run _pack_and_upload_code, which would need the kms_key, but in the init it cannot take the kms_key from run.

Would the soluton be to pass the kms_key when instantiating FrameworkProcessor and not on run?