aws / sagemaker-spark-container

The SageMaker Spark Container is a Docker image used to run data processing workloads with the Spark framework on Amazon SageMaker.
Apache License 2.0
36 stars 74 forks source link

Fix: multiple fixes included(pin deps, config change), see description for more details #55

Closed guoqiao1992 closed 3 years ago

guoqiao1992 commented 3 years ago

Issue #, if available:

  1. we still use requirement.txt for deps installation
  2. Seeing some transient failures sometimes
  3. Missing integ test with one instance
  4. Dependency on python37-sagemaker_pyspark

Description of changes:

  1. Change spark.rpc.askTimeout to 300(default 120)
  2. Change dfs.client.block.write.replace-datanode-on-failure.policy to ALWAYS
  3. Add integ test with 1 instance
  4. Remove dependency on python37-sagemaker_pyspark

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

apacker commented 3 years ago

Can you also include the bump for cryptography to 3.3.2 in this PR?

See: https://github.com/aws/sagemaker-spark-container/pull/52