aws / sagemaker-spark-container

The SageMaker Spark Container is a Docker image used to run data processing workloads with the Spark framework on Amazon SageMaker.
Apache License 2.0
36 stars 74 forks source link

Propagate env variable AWS_REGION to yarn #74

Closed can-sun closed 2 years ago

can-sun commented 2 years ago

Issue #, if available:

Got exception of SDK client that region cannot be found by the region provider. Processing container is forcing to run on yarn, and AWS_REGION's value is being set. However to make this variable visible to the yarn, spark conf needs to be explicitly set. After this change, tested by running a new job, it succeeded.

Reference: https://spark.apache.org/docs/latest/configuration.html#environment-variables https://stackoverflow.com/questions/47534642/setting-environment-variables-in-spark-cluster-mode

Description of changes: Before submitting spark application, we set the value of AWS_REGION for yarn in the phase of bootstrap

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.