Closed cSchubes closed 4 years ago
Hi @cSchubes ,
It seems that local mode is failing to execute Docker compose to start the containers for training. Are you able to run docker and docker compose yourself in the instance?
Thanks for using SageMaker.
hi @cSchubes, is this still an issue for you? another thing to check would be that the training script mnist.py
is in the same directory as your test script.
we moved away from the API due to time constraints on the project and instead are running our own training code on EC2 instances. However, I am interested in this going forward - I can post an update here when I get the chance to try out your suggestions.
@cSchubes thanks for the response. if you do get a chance to revisit trying out SageMaker, you may also be interested in Script Mode (for details, see https://sagemaker.readthedocs.io/en/stable/using_tf.html) - it should allow you to run your training script that you're using on EC2 with minimal modification in SageMaker.
also having the same problem " [Errno 2] No such file or directory: 'docker': 'docker'" when i use the localSession and train_instance_type = 'local'..... Why the local_mode documentation is so poor?
sorry for the delayed response here - usually [Errno 2] No such file or directory: 'docker': 'docker'
indicates that docker is not installed
@laurenyu It seems docker is not installed on SageMaker Studio by default. As a result, I encountered the same error when building a BYOC. What is the best practice here that you'd recommend? Thanks in advance.
@annaluo676 unfortunately, the best I can recommend at this time is to build the image elsewhere, e.g. locally, in a SageMaker Notebook Instance, or on an EC2 instance. There's been some planning around fixing this experience, but I don't yet have a timeline to share.
Please fill out the form below.
System Information
Describe the problem
We are attempting to follow the process outlined in this example to get us started with local SageMaker. We have everything running on an t2.mirco ec2 instance, and have put everything from the notebook linked above into a script. However, the script is failing with a
file not found
error. The relevant code is below:The call to
upload_data
is working (verified by checking S3). However, the SageMaker training code is not able to find this information.Minimal repro / logs
Log: