aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.14k forks source link

passing configuration for spark processing job #3732

Closed ajaiswalgit closed 1 year ago

ajaiswalgit commented 1 year ago

Describe the feature you'd like Feature to create spark.processing configuration.json file from kwargs to an S3 path. Many organizations do not have permission to write a file at bucket level.

How would this feature be used? Please describe. We need to pass few Spark configurations in configurations.json to override default behavior. Configs pass through kwargs.configuration creates a file configuration.json at default bucket level. But user do not have write access at bucket level.

Describe alternatives you've considered If we can pass default s3 path along with default S3 bucket then it will not fail.

Additional context User should be able to pass user_defined_s3_folder so that configuration.json is created at S3bucket/user_defined_s3_folder where they have got write permission.

s3_uri = ( f"s3://{self.sagemaker_session.default_bucket()}/{user_defined_s3_folder}/{self._current_job_name}/" f"input/{self._conf_container_input_name}/{self._conf_file_name}" )

jmahlik commented 1 year ago

Looks like this might be related to https://github.com/aws/sagemaker-python-sdk/issues/3200 and possibly fixed.

martinRenou commented 1 year ago

Closing as fixed by #3486