aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.1k stars 1.14k forks source link

ValueError: no SavedModel bundles found! #603

Open cfournies opened 5 years ago

cfournies commented 5 years ago

Please fill out the form below.

System Information

Describe the problem

When I deploy the model I'm getting a message that says "contact customer support". I go to the CloudWatch and I see the following error repeating 100 times

Traceback (most recent call last): File "/sagemaker/serve.py", line 189, in <module> ServiceManager().start() File "/sagemaker/serve.py", line 163, in start self._create_tfs_config() File "/sagemaker/serve.py", line 53, in _create_tfs_config raise ValueError('no SavedModel bundles found!')

On my Jupiter notebook I'm running the following to deploy:

predictor = estimator_call.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge') On the custom script, I have the training and evaluation functions, but I don't have any code for the serving because I'm assuming that is done by SageMaker, according to the documentation:

After a TensorFlow estimator has been fit, it saves a TensorFlow SavedModel in the S3 location defined by output_path. You can call deploy on a TensorFlow estimator to create a SageMaker Endpoint.

In S3 the model is saved in the right bucket. I'm using the following to specify where to save it dnn_model = tf.estimator.DNNClassifier(hidden_units=[20, 20, 20, 20], feature_columns=feature_column, n_classes=2, model_dir=model_dirr) model_dirr = os.environ.get('SM_MODEL_DIR')

I don't have more information, not sure where to even look, Any idea what the problem is?

mvsusp commented 5 years ago

Hi @cfournies,

The documentation was not updated. You need to save the TFS model to be able to deploy it in SageMaker. I created a PR with the doc update here https://github.com/aws/sagemaker-python-sdk/pull/607.

Thanks for using SageMaker!

morenoh149 commented 5 years ago

should be reopened, #607 was not merged.

laurenyu commented 5 years ago

potentially related: #599

kodamaleograph commented 1 year ago

get the sm_model_dir args, unknown = _parse_args() . . . model_dir = args.sm_model_dir

then do this

save_model_path = os.path.join(model_dir, '000000001')
<your model>.save(save_model_path)

hope this helps!

benieric commented 8 months ago

Hello @cfournies, the message posted by @kodamaleograph should help resolve the issue with saving a model in your training script. You can also take a look at this StackOverflow post with similar issue and resolution: https://stackoverflow.com/questions/59882941/valueerror-no-savedmodel-bundles-found-when-trying-to-deploy-a-tf2-0-model-to

Thanks!