aws-samples / amazon-sagemaker-ml-pipeline-deploy-with-terraform

MIT No Attribution
48 stars 37 forks source link

State Machine fails on Algorithm error #12

Open ajaypani opened 6 months ago

ajaypani commented 6 months ago

Using main branch, Infrastructure has been setup using terraform, Docker build and pushed to the ECR. When I run the ml-pipeline-terraform-demo-state-machine, it fails in step: Create Training Job "FailureReason": "AlgorithmError: , exit code: 1",

I suspected it is something to do with permissions on the python files in the src, even after changing and updating docker file , rebuilt the docker image, I am unable to call python files, or could not ls the contents of /opt/program