aws / sagemaker-spark-container

The SageMaker Spark Container is a Docker image used to run data processing workloads with the Spark framework on Amazon SageMaker.
Apache License 2.0
36 stars 74 forks source link

Fix history server logging and set regional configs #13

Closed mmeidl closed 4 years ago

mmeidl commented 4 years ago

Issue #, if available:

Description of changes:

This PR fixes some problems I encountered running the history server integ test case in PDT. We weren't getting all the logs from the history server process, so now using subprocess.run() which will pipe all logs to stdout in the parent process. With the extra logs we found the same S3 endpoint errors encountered before, so we need the alternate history server entrypoint to handle regional config bootstrapping too. I've made a quickfix in the local Py SDK to set AWS_REGION env var before starting the history server -- but to make this change permanent we will need to update the open PR for Spark Processor in Py SDK

From what I understand, we've had the same problems in CN region CodeBuild tests. This change should solve the history server endpoint issue in CN and Gov regions.

Testing: So far has been tested by building Spark container image locally, pushing to PDT repo, and running PDT CodeBuild-Test against my temporary branch test-history-server.

Will start the CodeBuild-Test run in PDX Alpha account.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mmeidl commented 4 years ago

Alpha CodeBuild run succeeded: SparkContainerCodeBuildProject-Test:9259b7d1-cd54-408c-bf8b-e115e7161893