aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
10.14k stars 6.78k forks source link

Import error in advanced_functionality/autogluon-tabular/AutoGluon_Tabular_SageMaker.ipynb #1208

Open tkazusa opened 4 years ago

tkazusa commented 4 years ago

Description of the error

when running the following cell in AutoGluon_Tabular_SageMaker.ipynb:

%%time

instance_type = 'ml.m5.2xlarge'
#instance_type = 'local'

ecr_image = f'{ecr_uri_prefix}/{training_algorithm_name}:latest'

estimator = Estimator(image_name=ecr_image,
                      role=role,
                      train_instance_count=1,
                      train_instance_type=instance_type,
                      hyperparameters=hyperparameters)

estimator.fit(train_s3_path)

I got the following error:

UnexpectedStatusException: Error for Training job autogluon-sagemaker-training-2020-05-10-07-12-25-701: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
Command "/usr/local/bin/python3.6 train.py --label y"

The log provided by training container was bellow.

Traceback (most recent call last):
  File "train.py", line 20, in <module>
    import autogluon as ag
  File "package/autogluon/__init__.py", line 10, in <module>
    from . import scheduler, searcher, utils
  File "package/autogluon/scheduler/__init__.py", line 6, in <module>
    from .fifo import *
  File "package/autogluon/scheduler/fifo.py", line 17, in <module>
    from ..searcher import BaseSearcher
  File "package/autogluon/searcher/__init__.py", line 2, in <module>
    from .skopt_searcher import *
  File "package/autogluon/searcher/skopt_searcher.py", line 6, in <module>
    from skopt import Optimizer
  File "package/skopt/__init__.py", line 55, in <module>
    from .searchcv import BayesSearchCV
  File "package/skopt/searchcv.py", line 16, in <module>
    from sklearn.utils.fixes import MaskedArray
ImportError: cannot import name 'MaskedArray'
2020-05-11 01:47:03,374 sagemaker-containers ERROR    ExecuteUserScriptError:
Command "/usr/local/bin/python3.6 train.py --label y"
Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 20, in <module>
    subprocess.check_call(shlex.split(' '.join(sys.argv[1:])))
  File "/usr/local/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['train']' returned non-zero exit status 1.

To reproduce

  1. Create the SageMaker notebook instance
    • Create new role which can access any S3 bucket
    • Set Git repository option to clone the amazon-sagemaker example repository to the notebook instance only.
  2. Run the notebook
austinmw commented 4 years ago

This should be fixed in latest version.