aws-samples / amazon-sagemaker-local-mode

Amazon SageMaker Local Mode Examples
MIT No Attribution
242 stars 59 forks source link

KeyError 's3distributiontype' #2

Closed tlienart closed 3 years ago

tlienart commented 3 years ago

Hello,

I was trying to run the LightGBM example from a SageMaker notebook instance, only swapping the algorithm for XGBoost. The code is otherwise the same as the example, local data location included. I keep getting an error with stacktrace below:

algo-1-ut57f_1  | INFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)
algo-1-ut57f_1  | INFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode
algo-1-ut57f_1  | ERROR:sagemaker-containers:Reporting training FAILURE
algo-1-ut57f_1  | ERROR:sagemaker-containers:framework error: 
algo-1-ut57f_1  | Traceback (most recent call last):
algo-1-ut57f_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py", line 84, in train
algo-1-ut57f_1  |     entrypoint()
algo-1-ut57f_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_xgboost_container/training.py", line 94, in main
algo-1-ut57f_1  |     train(framework.training_env())
algo-1-ut57f_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_xgboost_container/training.py", line 90, in train
algo-1-ut57f_1  |     run_algorithm_mode()
algo-1-ut57f_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_xgboost_container/training.py", line 68, in run_algorithm_mode
algo-1-ut57f_1  |     checkpoint_config=checkpoint_config
algo-1-ut57f_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 115, in sagemaker_train
algo-1-ut57f_1  |     validated_data_config = channels.validate(data_config)
algo-1-ut57f_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_algorithm_toolkit/channel_validation.py", line 106, in validate
algo-1-ut57f_1  |     channel_obj.validate(value)
algo-1-ut57f_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_algorithm_toolkit/channel_validation.py", line 52, in validate
algo-1-ut57f_1  |     if (value[CONTENT_TYPE], value[TRAINING_INPUT_MODE], value[S3_DIST_TYPE]) not in self.supported:
algo-1-ut57f_1  | KeyError: 'S3DistributionType'
algo-1-ut57f_1  | 
algo-1-ut57f_1  | 'S3DistributionType'
tmpokc09nq0_algo-1-ut57f_1 exited with code 1

This seems to be caused by a bad input type for the data. Two questions:

  1. are the samples here with SageMaker SDK 1 or 2 (I'm using 2.16)
  2. have you encountered this issue before and would you know how to approach it?

I've looked around and found a few other reports of this error but no fix.

Thanks!

eitansela commented 3 years ago

Hello @tlienart

All the examples here are with SageMaker SDK V2.

I didn't quite understand what you were trying to do, but If you are trying to train the SageMaker Built-In XGBoost algorithm locally, then it will not work. This is the error you will get.
All SageMaker built in algorithms can be trained only with SageMaker managed instances.

SageMaker local mode is for Script Mode and for Bring Your Own Container training.

tlienart commented 3 years ago

thanks Eitan