Open joejztang opened 2 years ago
Hi @joejztang
By the errors you are describing it looks you are trying to use data parallel in local mode. Is this what you are trying to do?
@eitansela Hi, I don't mean to use data parallel in local. I am trying to run this locally https://github.com/aws-samples/amazon-sagemaker-local-mode/tree/main/tensorflow_script_mode_local_training_and_serving. After solving the s3 issue, it's giving me the error [Errno 2] No such file or directory: '/opt/ml/input/config/resourceconfig.json'
Got it. What is the SageMaker SDK you have installed, and which operating system?
@eitansela sorry for the late reply. I will share some info on below.
sagemaker sdk version 2.110.0. os: macos
addtional info:
conda create --name localmode python=3.9
. running python version 3.9.13.scikit_learn_script_mode_local_training_and_serving
as an example, in order to run it compliant to company's setting up, have to pass in sagemaker_session=LocalSession(boto_session=boto3.Session(region_name='us-west-2', profile_name='<awesomeprofile>'))
here at https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/scikit_learn_script_mode_local_training_and_serving/scikit_learn_script_mode_local_training_and_serving.py#L66. personally don't think this would interrupt anything, but if it is, then this is something worth mentioning.thanks for the reply. please tag @joejztang if you find anything.
Hi @joejztang , do you run it on Intel or Arm based Mac?
@eitansela intel based.
Can you please attach the full logs?
Hey dear aws, I have ran a couple of models in this repo, but none of them are working so far. I am able to solve authenticate issues for s3, but when it turned to create container either to train or predict, they are always issues.
Personally I saw popular ones are
[Errno 2] No such file or directory: '/opt/ml/input/config/resourceconfig.json'
.A questions on my side: is there a way to bypass issues solve all other potential issues well locally? Any comments, solutions are welcome, thanks in advance!