aws-samples / amazon-sagemaker-local-mode

Amazon SageMaker Local Mode Examples
MIT No Attribution
242 stars 59 forks source link

SKLearnProcessor unable to recognize preprocessing file in sagemaker local mode #30

Open Kedharkb opened 1 year ago

Kedharkb commented 1 year ago

Hello, I am experiencing an issue while using SageMaker to build a machine learning model. in Sagemaker local mode, I am facing an issue that says "python3: can't open file '/opt/ml/processing/code/processing_script.py': [Errno 2] No such file or directory." Even though the file is present in the correct location, the same code works fine in the normal mode. I am running my project in vscode dev conatiner and inside the dev container i am trying out the sagemaker local mode. Please let me know what could possibly be wrong with the following code.

Error

INFO:sagemaker.local.image:docker command: docker-compose -f /tmp/tmpx5_z74cs/docker-compose.yaml up --build --abort-on-container-exit
Creating 5ktwpjv1ug-algo-1-5bua0 ... 
Creating 5ktwpjv1ug-algo-1-5bua0 ... done
Attaching to 5ktwpjv1ug-algo-1-5bua0
5ktwpjv1ug-algo-1-5bua0 | python3: can't open file '/opt/ml/processing/input/code/processing_script.py': [Errno 2] No such file or directory
5ktwpjv1ug-algo-1-5bua0 exited with code 2
2
Aborting on container exit...
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/local/image.py", line 166, in process
    _stream_output(process)
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/local/image.py", line 916, in _stream_output
    raise RuntimeError("Process exited with code: %s" % exit_code)
RuntimeError: Process exited with code: 2

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main2.py", line 20, in <module>
    processor.run(
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/workflow/pipeline_context.py", line 272, in wrapper
    return run_func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/processing.py", line 615, in run
    self.latest_job = ProcessingJob.start_new(
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/processing.py", line 849, in start_new
    processor.sagemaker_session.process(**process_args)
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/session.py", line 1024, in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/session.py", line 4813, in _intercept_create_request
    return create(request)
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/session.py", line 1022, in submit
    self.sagemaker_client.create_processing_job(**request)
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/local/local_session.py", line 128, in create_processing_job
    processing_job.start(
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/local/entities.py", line 140, in start
    self.container.process(
  File "/usr/local/lib/python3.8/dist-packages/sagemaker/local/image.py", line 171, in process
    raise RuntimeError(msg) from e
RuntimeError: Failed to run: ['docker-compose', '-f', '/tmp/tmpx5_z74cs/docker-compose.yaml', 'up', '--build', '--abort-on-container-exit']

Code

from sagemaker.local import LocalSession from sagemaker.processing import ProcessingInput, ProcessingOutput from sagemaker.sklearn.processing import SKLearnProcessor

sagemaker_session = LocalSession() sagemaker_session.config = {"local": {"local_code": True}}

role = "arn:aws:iam::111111111111:role/service-role/AmazonSageMaker-ExecutionRole-20200101T000001"

processor = SKLearnProcessor( framework_version="0.20.0", instance_count=1, instance_type="local", role=role )

print("Starting processing job.") print( "Note: if launching for the first time in local mode, container image download" " might take a few minutes to complete." ) processor.run( code="processing_script.py", inputs=[ ProcessingInput( source='./input_data/', destination="/opt/ml/processing/input_data/", ) ], outputs=[ ProcessingOutput( output_name="word_count_data", source="/opt/ml/processing/processed_data/" ) ], arguments=["job-type", "word-count"], )

preprocessing_job_description = processor.jobs[-1].describe() output_config = preprocessing_job_description["ProcessingOutputConfig"]

print(output_config)

for output in output_config["Outputs"]: if output["OutputName"] == "word_count_data": word_count_data_file = output["S3Output"]["S3Uri"]

print("Output file is located on: {}".format(word_count_data_file))

eitansela commented 1 year ago

What happens if you try to run the example in the same container but in command line? python3 example.py I want to rule out VSCode configuration issue.

vibhabellutagi19 commented 1 year ago

I have the same issue, how did you resolve?

vibhabellutagi19 commented 1 year ago

What happens if you try to run the example in the same container but in command line? python3 example.py I want to rule out VSCode configuration issue.

@eitansela I get the same error, Im not working on vscode. and Im trying to this code on mac.

I see it is not able to recognise the path - python3: can't open file '/opt/ml/processing/input/code/processing_script.py': [Errno 2] No such file or directory

can you please help me.

eitansela commented 12 months ago

@VibhavariBellutagi19 can you run it not as part of VSCode, but with python scikit_learn_bring_your_own_container_local_processing.py

Kedharkb commented 12 months ago

@eitansela i was able to run it outside of the dev container successfully just by invoking the local script