aws / sagemaker-spark-container

The SageMaker Spark Container is a Docker image used to run data processing workloads with the Spark framework on Amazon SageMaker.
Apache License 2.0
36 stars 74 forks source link

Fix: multiple fixes included(pin deps, config change), see description for more details #54

Closed YouNeverKnow10 closed 3 years ago

YouNeverKnow10 commented 3 years ago

Issue #, if available:

  1. we still use requirement.txt for deps installation
  2. Seeing some transient failures sometime
  3. Missing integ test with one instance

Description of changes:

  1. Pin deps using Pipfile
  2. Change spark.rpc.askTimeout to 300(default 120)
  3. Change dfs.client.block.write.replace-datanode-on-failure.policy to ALWAYS
  4. Add integ test with 1 instance

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.