aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
9.98k stars 6.73k forks source link

SM_CHANNEL_TRAIN error when building workflow #3402

Open entest-hai opened 2 years ago

entest-hai commented 2 years ago

Link to the notebook here is the link to the notebook

Describe the bug Error when executing the state machine in the stepfunctions

To reproduce Just follow exact steps in the above notebook Logs If applicable, add logs to help explain your problem.

Here is the log

"FailureReason": "AlgorithmError: framework error: \nTraceback (most recent call last):\n File \"/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py\", line 84, in train\n entrypoint()\n File \"/miniconda3/lib/python3.7/site-packages/sagemaker_xgboost_container/training.py\", line 94, in main\n train(framework.training_env())\n File \"/miniconda3/lib/python3.7/site-packages/sagemaker_xgboost_container/training.py\", line 90, in train\n run_algorithm_mode()\n File \"/miniconda3/lib/python3.7/site-packages/sagemaker_xgboost_container/training.py\", line 56, in run_algorithm_mode\n train_path = os.environ[sm_env_constants.SM_CHANNEL_TRAIN]\n File \"/miniconda3/lib/python3.7/os.py\", line 681, in getitem\n raise KeyError(key) from None\nKeyError: 'SM_CHANNEL_TRAIN'\n\n'SM_CHANNEL_TRAIN', exit code: 1",

jackwooley commented 1 year ago

Were you ever able to make progress on this issue? I'm following the same tutorial and am getting the same error in the TrainStep.