aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.14k forks source link

tests/integ failed #35

Closed anfeng closed 6 years ago

anfeng commented 6 years ago

I am trying to follow README. While unit tests work fine, I got the following errors at integration tests. Any suggestion? `$ tox tests/integ GLOB sdist-make: /Users/andyfeng/dev/sagemaker-python-sdk/setup.py py27 inst-nodeps: /Users/andyfeng/dev/sagemaker-python-sdk/.tox/dist/sagemaker-1.0.1.zip py27 installed: apipkg==1.4,attrs==17.4.0,backports.weakref==1.0.post1,bleach==1.5.0,boto3==1.5.7,botocore==1.8.21,contextlib2==0.5.5,coverage==4.4.2,docutils==0.14,enum34==1.1.6,execnet==1.5.0,funcsigs==1.0.2,futures==3.2.0,html5lib==0.9999999,jmespath==0.9.3,Markdown==2.6.10,mock==2.0.0,numpy==1.13.3,pbr==3.1.1,pluggy==0.6.0,protobuf==3.5.1,py==1.5.2,pytest==3.3.1,pytest-cov==2.5.1,pytest-forked==0.2,pytest-xdist==1.21.0,python-dateutil==2.6.1,s3transfer==0.1.12,sagemaker==1.0.1,scipy==1.0.0,six==1.11.0,teamcity-messages==1.21,tensorflow==1.4.1,tensorflow-tensorboard==0.4.0rc3,Werkzeug==0.14 py27 runtests: PYTHONHASHSEED='3746448766' py27 runtests: commands[0] | pytest tests/integ ================================================================ test session starts ================================================================= platform darwin -- Python 2.7.14, pytest-3.3.1, py-1.5.2, pluggy-0.6.0 -- /Users/andyfeng/dev/sagemaker-python-sdk/.tox/py27/bin/python2.7 cachedir: .cache rootdir: /Users/andyfeng/dev/sagemaker-python-sdk, inifile: setup.cfg plugins: teamcity-messages-1.21, xdist-1.21.0, forked-0.2, cov-2.5.1 collected 7 items

tests/integ/test_kmeans.py::test_kmeans FAILED [ 14%] tests/integ/test_linear_learner.py::test_linear_learner FAILED [ 28%] tests/integ/test_mxnet_train.py::test_attach_deploy ERROR [ 42%] tests/integ/test_mxnet_train.py::test_deploy_model ERROR [ 57%] tests/integ/test_pca.py::test_pca FAILED [ 71%] tests/integ/test_tf.py::test_tf FAILED [ 85%] tests/integ/test_tf_cifar.py::test_cifar FAILED [100%]

=================================================================================== ERRORS =================================================================================== ____ ERROR at setup of test_attach_deploy ____

sagemaker_session = <sagemaker.session.Session object at 0x10f20b890>

@pytest.fixture(scope='module')
def mxnet_training_job(sagemaker_session):
    with timeout(minutes=15):
        script_path = os.path.join(DATA_DIR, 'mxnet_mnist', 'mnist.py')
        data_path = os.path.join(DATA_DIR, 'mxnet_mnist')

        mx = MXNet(entry_point=script_path, role='SageMakerRole',
                   train_instance_count=1, train_instance_type='ml.c4.xlarge',
                   sagemaker_session=sagemaker_session)

        train_input = mx.sagemaker_session.upload_data(path=os.path.join(data_path, 'train'),
                                                       key_prefix='integ-test-data/mxnet_mnist/train')
        test_input = mx.sagemaker_session.upload_data(path=os.path.join(data_path, 'test'),
                                                      key_prefix='integ-test-data/mxnet_mnist/test')
      mx.fit({'train': train_input, 'test': test_input})

tests/integ/test_mxnet_train.py:47:


.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:517: in fit super(Framework, self).fit(inputs, wait, logs, self._current_job_name) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit self.latest_training_job.wait(logs=logs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait self.sagemaker_session.logs_for_job(self.job_name, wait=True) .tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x10f20b890>, job = 'sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859' desc = {'AlgorithmSpecification': {'TrainingImage': '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-mxnet-py2-cpu:1.0...sagemaker_job_name': '"sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859"', 'sagemaker_program': '"mnist.py"', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError --------------------------------------------------------------------------- Captured stdout setup ---------------------------------------------------------------------------- .......................... --------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------- INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:sagemaker:Creating training-job with name: sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859 INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com ----------------------------------------------------------------------------- Captured log setup ----------------------------------------------------------------------------- credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com session.py 237 INFO Creating training-job with name: sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859 connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com ____ ERROR at setup of test_deploymodel ____

sagemaker_session = <sagemaker.session.Session object at 0x10f20b890>

@pytest.fixture(scope='module')
def mxnet_training_job(sagemaker_session):
    with timeout(minutes=15):
        script_path = os.path.join(DATA_DIR, 'mxnet_mnist', 'mnist.py')
        data_path = os.path.join(DATA_DIR, 'mxnet_mnist')

        mx = MXNet(entry_point=script_path, role='SageMakerRole',
                   train_instance_count=1, train_instance_type='ml.c4.xlarge',
                   sagemaker_session=sagemaker_session)

        train_input = mx.sagemaker_session.upload_data(path=os.path.join(data_path, 'train'),
                                                       key_prefix='integ-test-data/mxnet_mnist/train')
        test_input = mx.sagemaker_session.upload_data(path=os.path.join(data_path, 'test'),
                                                      key_prefix='integ-test-data/mxnet_mnist/test')
      mx.fit({'train': train_input, 'test': test_input})

tests/integ/test_mxnet_train.py:47:


.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:517: in fit super(Framework, self).fit(inputs, wait, logs, self._current_job_name) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit self.latest_training_job.wait(logs=logs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait self.sagemaker_session.logs_for_job(self.job_name, wait=True) .tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x10f20b890>, job = 'sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859' desc = {'AlgorithmSpecification': {'TrainingImage': '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-mxnet-py2-cpu:1.0...sagemaker_job_name': '"sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859"', 'sagemaker_program': '"mnist.py"', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError ================================================================================== FAILURES ================================================================================== ____ testkmeans ____

def test_kmeans():

    with timeout(minutes=15):
        sagemaker_session = sagemaker.Session(boto_session=boto3.Session(region_name=REGION))
        data_path = os.path.join(DATA_DIR, 'one_p_mnist', 'mnist.pkl.gz')
        pickle_args = {} if sys.version_info.major == 2 else {'encoding': 'latin1'}

        # Load the data into memory as numpy arrays
        with gzip.open(data_path, 'rb') as f:
            train_set, _, _ = pickle.load(f, **pickle_args)

        kmeans = KMeans(role='SageMakerRole', train_instance_count=1,
                        train_instance_type='ml.c4.xlarge',
                        k=10, sagemaker_session=sagemaker_session, base_job_name='test-kmeans')

        kmeans.init_method = 'random'
        kmeans.max_iterators = 1
        kmeans.tol = 1
        kmeans.num_trials = 1
        kmeans.local_init_method = 'kmeans++'
        kmeans.half_life_time_size = 1
        kmeans.epochs = 1
        kmeans.center_factor = 1
      kmeans.fit(kmeans.record_set(train_set[0][:100]))

tests/integ/test_kmeans.py:51:


.tox/py27/lib/python2.7/site-packages/sagemaker/amazon/amazon_estimator.py:96: in fit super(AmazonAlgorithmEstimatorBase, self).fit(data, **kwargs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit self.latest_training_job.wait(logs=logs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait self.sagemaker_session.logs_for_job(self.job_name, wait=True) .tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x10f2c4e50>, job = 'test-kmeans-2018-01-01-03-32-56-860' desc = {'AlgorithmSpecification': {'TrainingImage': '174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:1', 'TrainingInputMo... 'HyperParameters': {'epochs': '1', 'extra_center_factor': '1', 'feature_dim': '784', 'force_dense': 'True', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training test-kmeans-2018-01-01-03-32-56-860: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError ---------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------- .................... ---------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------- INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:sagemaker:Created S3 bucket: sagemaker-us-west-2-379899735384 INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:sagemaker:Creating training-job with name: test-kmeans-2018-01-01-03-32-56-860 INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com ----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------ credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com session.py 163 INFO Created S3 bucket: sagemaker-us-west-2-379899735384 connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com session.py 237 INFO Creating training-job with name: test-kmeans-2018-01-01-03-32-56-860 connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com ____ test_linearlearner ____

def test_linear_learner():
    with timeout(minutes=15):
        sagemaker_session = sagemaker.Session(boto_session=boto3.Session(region_name=REGION))
        data_path = os.path.join(DATA_DIR, 'one_p_mnist', 'mnist.pkl.gz')
        pickle_args = {} if sys.version_info.major == 2 else {'encoding': 'latin1'}

        # Load the data into memory as numpy arrays
        with gzip.open(data_path, 'rb') as f:
            train_set, _, _ = pickle.load(f, **pickle_args)

        train_set[1][:100] = 1
        train_set[1][100:200] = 0
        train_set = train_set[0], train_set[1].astype(np.dtype('float32'))

        ll = LinearLearner('SageMakerRole', 1, 'ml.c4.2xlarge', base_job_name='test-linear-learner',
                           sagemaker_session=sagemaker_session)
        ll.binary_classifier_model_selection_criteria = 'accuracy'
        ll.target_reacall = 0.5
        ll.target_precision = 0.5
        ll.positive_example_weight_mult = 0.1
        ll.epochs = 1
        ll.predictor_type = 'binary_classifier'
        ll.use_bias = True
        ll.num_models = 1
        ll.num_calibration_samples = 1
        ll.init_method = 'uniform'
        ll.init_scale = 0.5
        ll.init_sigma = 0.2
        ll.init_bias = 5
        ll.optimizer = 'adam'
        ll.loss = 'logistic'
        ll.wd = 0.5
        ll.l1 = 0.5
        ll.momentum = 0.5
        ll.learning_rate = 0.1
        ll.beta_1 = 0.1
        ll.beta_2 = 0.1
        ll.use_lr_scheduler = True
        ll.lr_scheduler_step = 2
        ll.lr_scheduler_factor = 0.5
        ll.lr_scheduler_minimum_lr = 0.1
        ll.normalize_data = False
        ll.normalize_label = False
        ll.unbias_data = True
        ll.unbias_label = False
        ll.num_point_for_scala = 10000
      ll.fit(ll.record_set(train_set[0][:200], train_set[1][:200]))

tests/integ/test_linear_learner.py:74:


.tox/py27/lib/python2.7/site-packages/sagemaker/amazon/amazon_estimator.py:96: in fit super(AmazonAlgorithmEstimatorBase, self).fit(data, **kwargs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit self.latest_training_job.wait(logs=logs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait self.sagemaker_session.logs_for_job(self.job_name, wait=True) .tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x113a8c450>, job = 'test-linear-learner-2018-01-01-03-34-54-936' desc = {'AlgorithmSpecification': {'TrainingImage': '174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:1', 'Trainin...ta_1': '0.1', 'binary_classifier_model_selection_criteria': 'accuracy', 'epochs': '1', 'feature_dim': '784', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training test-linear-learner-2018-01-01-03-34-54-936: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError ---------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------- .................... ---------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------- INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:sagemaker:Creating training-job with name: test-linear-learner-2018-01-01-03-34-54-936 INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com ----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------ credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com session.py 237 INFO Creating training-job with name: test-linear-learner-2018-01-01-03-34-54-936 connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com __ test_pca __

def test_pca():
    with timeout(minutes=15):
        sagemaker_session = sagemaker.Session(boto_session=boto3.Session(region_name=REGION))
        data_path = os.path.join(DATA_DIR, 'one_p_mnist', 'mnist.pkl.gz')
        pickle_args = {} if sys.version_info.major == 2 else {'encoding': 'latin1'}

        # Load the data into memory as numpy arrays
        with gzip.open(data_path, 'rb') as f:
            train_set, _, _ = pickle.load(f, **pickle_args)

        pca = sagemaker.amazon.pca.PCA(role='SageMakerRole', train_instance_count=1,
                                       train_instance_type='ml.m4.xlarge',
                                       num_components=48, sagemaker_session=sagemaker_session, base_job_name='test-pca')

        pca.algorithm_mode = 'randomized'
        pca.subtract_mean = True
        pca.extra_components = 5
      pca.fit(pca.record_set(train_set[0][:100]))

tests/integ/test_pca.py:44:


.tox/py27/lib/python2.7/site-packages/sagemaker/amazon/amazon_estimator.py:96: in fit super(AmazonAlgorithmEstimatorBase, self).fit(data, **kwargs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit self.latest_training_job.wait(logs=logs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait self.sagemaker_session.logs_for_job(self.job_name, wait=True) .tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x113c51ed0>, job = 'test-pca-2018-01-01-03-39-15-456' desc = {'AlgorithmSpecification': {'TrainingImage': '174872318107.dkr.ecr.us-west-2.amazonaws.com/pca:1', 'TrainingInputMode'...': {'algorithm_mode': 'randomized', 'extra_components': '5', 'feature_dim': '784', 'mini_batch_size': '100', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training test-pca-2018-01-01-03-39-15-456: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError ---------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------- .................... ---------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------- INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:sagemaker:Creating training-job with name: test-pca-2018-01-01-03-39-15-456 INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com ----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------ credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com session.py 237 INFO Creating training-job with name: test-pca-2018-01-01-03-39-15-456 connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com __ test_tf ___

sagemaker_session = <sagemaker.session.Session object at 0x1134ce350>

def test_tf(sagemaker_session):
    with timeout(minutes=15):
        script_path = os.path.join(DATA_DIR, 'iris', 'iris-dnn-classifier.py')
        data_path = os.path.join(DATA_DIR, 'iris', 'data')

        estimator = TensorFlow(entry_point=script_path,
                               role='SageMakerRole',
                               training_steps=1,
                               evaluation_steps=1,
                               hyperparameters={'input_tensor_name': 'inputs'},
                               train_instance_count=1,
                               train_instance_type='ml.c4.xlarge',
                               sagemaker_session=sagemaker_session,
                               base_job_name='test-tf')

        inputs = estimator.sagemaker_session.upload_data(path=data_path, key_prefix='integ-test-data/tf_iris')
      estimator.fit(inputs)

tests/integ/test_tf.py:44:


.tox/py27/lib/python2.7/site-packages/sagemaker/tensorflow/estimator.py:166: in fit fit_super() .tox/py27/lib/python2.7/site-packages/sagemaker/tensorflow/estimator.py:154: in fit_super super(TensorFlow, self).fit(inputs, wait, logs, job_name) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:517: in fit super(Framework, self).fit(inputs, wait, logs, self._current_job_name) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit self.latest_training_job.wait(logs=logs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait self.sagemaker_session.logs_for_job(self.job_name, wait=True) .tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x1134ce350>, job = 'test-tf-2018-01-01-03-41-00-415' desc = {'AlgorithmSpecification': {'TrainingImage': '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-py2-cp...ckpoints"', 'evaluation_steps': '1', 'input_tensor_name': '"inputs"', 'sagemaker_container_log_level': '20', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training test-tf-2018-01-01-03-41-00-415: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError --------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------- INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials ----------------------------------------------------------------------------- Captured log setup ----------------------------------------------------------------------------- credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials ---------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------- .................... ---------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------- INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:sagemaker:Creating training-job with name: test-tf-2018-01-01-03-41-00-415 INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com ----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------ connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com session.py 237 INFO Creating training-job with name: test-tf-2018-01-01-03-41-00-415 connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com _____ testcifar ____

sagemaker_session = <sagemaker.session.Session object at 0x1150fd850>

def test_cifar(sagemaker_session):
    with timeout(minutes=15):
        script_path = os.path.join(DATA_DIR, 'cifar_10', 'source')

        dataset_path = os.path.join(DATA_DIR, 'cifar_10', 'data')

        estimator = TensorFlow(entry_point='resnet_cifar_10.py', source_dir=script_path, role='SageMakerRole',
                               training_steps=20, evaluation_steps=5,
                               train_instance_count=2, train_instance_type='ml.p2.xlarge',
                               sagemaker_session=sagemaker_session,
                               base_job_name='test-cifar')

        inputs = estimator.sagemaker_session.upload_data(path=dataset_path, key_prefix='data/cifar10')
      estimator.fit(inputs)

tests/integ/test_tf_cifar.py:54:


.tox/py27/lib/python2.7/site-packages/sagemaker/tensorflow/estimator.py:166: in fit fit_super() .tox/py27/lib/python2.7/site-packages/sagemaker/tensorflow/estimator.py:154: in fit_super super(TensorFlow, self).fit(inputs, wait, logs, job_name) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:517: in fit super(Framework, self).fit(inputs, wait, logs, self._current_job_name) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:152: in fit self.latest_training_job = _TrainingJob.start_new(self, inputs) .tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:263: in start_new hyperparameters=hyperparameters, stop_condition=stop_condition) .tox/py27/lib/python2.7/site-packages/sagemaker/session.py:239: in train self.sagemaker_client.create_training_job(**train_request) .tox/py27/lib/python2.7/site-packages/botocore/client.py:317: in _api_call return self._make_api_call(operation_name, kwargs)


self = <botocore.client.SageMaker object at 0x1147f4210>, operation_name = 'CreateTrainingJob' api_params = {'AlgorithmSpecification': {'TrainingImage': '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-py2-gp...-2-379899735384/data/cifar10'}}}], 'OutputDataConfig': {'S3OutputPath': 's3://sagemaker-us-west-2-379899735384/'}, ...}

def _make_api_call(self, operation_name, api_params):
    operation_model = self._service_model.operation_model(operation_name)
    service_name = self._service_model.service_name
    history_recorder.record('API_CALL', {
        'service': service_name,
        'operation': operation_name,
        'params': api_params,
    })
    if operation_model.deprecated:
        logger.debug('Warning: %s.%s() is deprecated',
                     service_name, operation_name)
    request_context = {
        'client_region': self.meta.region_name,
        'client_config': self.meta.config,
        'has_streaming_input': operation_model.has_streaming_input,
        'auth_type': operation_model.auth_type,
    }
    request_dict = self._convert_to_request_dict(
        api_params, operation_model, context=request_context)

    handler, event_response = self.meta.events.emit_until_response(
        'before-call.{endpoint_prefix}.{operation_name}'.format(
            endpoint_prefix=self._service_model.endpoint_prefix,
            operation_name=operation_name),
        model=operation_model, params=request_dict,
        request_signer=self._request_signer, context=request_context)

    if event_response is not None:
        http, parsed_response = event_response
    else:
        http, parsed_response = self._endpoint.make_request(
            operation_model, request_dict)

    self.meta.events.emit(
        'after-call.{endpoint_prefix}.{operation_name}'.format(
            endpoint_prefix=self._service_model.endpoint_prefix,
            operation_name=operation_name),
        http_response=http, parsed=parsed_response,
        model=operation_model, context=request_context
    )

    if http.status_code >= 300:
        error_code = parsed_response.get("Error", {}).get("Code")
        error_class = self.exceptions.from_code(error_code)
      raise error_class(parsed_response, operation_name)

E ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: The account-level service limit for training-job/ml.p2.xlarge is 0 Instances, with current utilization of 0 Instances and a request delta of 2 Instances. Please contact AWS support to request an increase for this limit.

.tox/py27/lib/python2.7/site-packages/botocore/client.py:615: ResourceLimitExceeded --------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------- INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials ----------------------------------------------------------------------------- Captured log setup ----------------------------------------------------------------------------- credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials ---------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------- INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com INFO:sagemaker:Creating training-job with name: test-cifar-2018-01-01-03-42-43-090 INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com ----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------ connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com session.py 237 INFO Creating training-job with name: test-cifar-2018-01-01-03-42-43-090 connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com ==================================================================== 5 failed, 2 error in 599.21 seconds ===================================================================== ERROR: InvocationError: '/Users/andyfeng/dev/sagemaker-python-sdk/.tox/py27/bin/pytest tests/integ'`

andremoeller commented 6 years ago

Hi @anfeng ,

That integration test trains on two ml.p2.xlarge instances, but your AWS account currently has a limit of zero ml.p2.xlarge instances:

ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: The account-level service limit for training-job/ml.p2.xlarge is 0 Instances, with current utilization of 0 Instances and a request delta of 2 Instances. Please contact AWS support to request an increase for this limit.

You can request a limit increase through AWS support. Or you can modify the integration test to use a different instance type, like ml.m4.xlarge.

Thanks for using Amazon SageMaker! Please let us know if you have more questions.