aws-samples / amazon-sagemaker-local-mode

Amazon SageMaker Local Mode Examples
MIT No Attribution
246 stars 61 forks source link

models compatibility for local mode #20

Open joejztang opened 2 years ago

joejztang commented 2 years ago

Hey dear aws, I have ran a couple of models in this repo, but none of them are working so far. I am able to solve authenticate issues for s3, but when it turned to create container either to train or predict, they are always issues.

Personally I saw popular ones are

  1. [Errno 2] No such file or directory: '/opt/ml/input/config/resourceconfig.json'.
  2. some kind of indication that some data parallel is not supported in local mode (sorry I didn't remember the details).

A questions on my side: is there a way to bypass issues solve all other potential issues well locally? Any comments, solutions are welcome, thanks in advance!

eitansela commented 2 years ago

Hi @joejztang

By the errors you are describing it looks you are trying to use data parallel in local mode. Is this what you are trying to do?

joejztang commented 2 years ago

@eitansela Hi, I don't mean to use data parallel in local. I am trying to run this locally https://github.com/aws-samples/amazon-sagemaker-local-mode/tree/main/tensorflow_script_mode_local_training_and_serving. After solving the s3 issue, it's giving me the error [Errno 2] No such file or directory: '/opt/ml/input/config/resourceconfig.json'

eitansela commented 2 years ago

Got it. What is the SageMaker SDK you have installed, and which operating system?

joejztang commented 2 years ago

@eitansela sorry for the late reply. I will share some info on below.

sagemaker sdk version 2.110.0. os: macos

addtional info:

  1. installed env thru miniconda. conda create --name localmode python=3.9. running python version 3.9.13.
  2. take scikit_learn_script_mode_local_training_and_serving as an example, in order to run it compliant to company's setting up, have to pass in sagemaker_session=LocalSession(boto_session=boto3.Session(region_name='us-west-2', profile_name='<awesomeprofile>')) here at https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/scikit_learn_script_mode_local_training_and_serving/scikit_learn_script_mode_local_training_and_serving.py#L66. personally don't think this would interrupt anything, but if it is, then this is something worth mentioning.

thanks for the reply. please tag @joejztang if you find anything.

eitansela commented 2 years ago

Hi @joejztang , do you run it on Intel or Arm based Mac?

joejztang commented 2 years ago

@eitansela intel based.

eitansela commented 2 years ago

Can you please attach the full logs?