Closed giuseppeporcelli closed 4 years ago
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
@giuseppeporcelli Retried the sagemaker-pytorch-inference
build and it looks like the same test timed out again:
=================================== FAILURES ===================================
________________________________ test_mnist_cpu ________________________________
sagemaker_session = <sagemaker.session.Session object at 0x7f4b80346320>
image_uri = '142577830533.dkr.ecr.us-west-2.amazonaws.com/sagemaker-test:1.4.0-pytorch-sagemaker-pytorch-inference-04131d15-2e47-4fe3-83da-1bf0e5551b62'
instance_type = 'ml.c4.xlarge'
@pytest.mark.cpu_test
def test_mnist_cpu(sagemaker_session, image_uri, instance_type):
instance_type = instance_type or 'ml.c4.xlarge'
> _test_mnist_distributed(sagemaker_session, image_uri, instance_type, model_cpu_tar, mnist_cpu_script)
test-toolkit/integration/sagemaker/test_mnist.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test-toolkit/integration/sagemaker/test_mnist.py:65: in _test_mnist_distributed
endpoint_name=endpoint_name)
.tox/py36/lib/python3.6/site-packages/sagemaker/model.py:515: in deploy
data_capture_config_dict=data_capture_config_dict,
.tox/py36/lib/python3.6/site-packages/sagemaker/session.py:2872: in endpoint_from_production_variants
return self.create_endpoint(endpoint_name=name, config_name=name, tags=tags, wait=wait)
.tox/py36/lib/python3.6/site-packages/sagemaker/session.py:2404: in create_endpoint
self.wait_for_endpoint(endpoint_name)
.tox/py36/lib/python3.6/site-packages/sagemaker/session.py:2651: in wait_for_endpoint
desc = _wait_until(lambda: _deploy_done(self.sagemaker_client, endpoint), poll)
.tox/py36/lib/python3.6/site-packages/sagemaker/session.py:3602: in _wait_until
time.sleep(poll)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
signum = 14, frame = <frame object at 0x7f4b7b551238>
def handler(signum, frame):
> raise TimeoutError('timed out after {} seconds'.format(limit))
E integration.sagemaker.timeout.TimeoutError: timed out after 1800 seconds
test-toolkit/integration/sagemaker/timeout.py:44: TimeoutError
I'm not able to replicate the issue locally. Can I have access to the logs of the endpoint being created and see why the deployment is not working? Thanks.
…odel mode.
Issue #, if available:
Description of changes: I have fixed the handler service to allow including the 'code' dir (where user modules are stored) to the Python path. This is required for importing the custom user modules when the container is used in multi-model mode.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.