Open nikhilraju opened 5 years ago
Hi, I apologize for the confusion. The notebook is not updated to demonstrate how to use Batch Transform feature for SageMaker. The bulk inference feature mentioned here was the way to do bulk inference before SageMaker launched Batch Inference. This is not the recommended way right now, we should be updating the notebook.
For Batch Transform, your input data should reside in S3 and you should use using the create_transform_job
API (or sagemaker.transformer.Transformer
if you are using Python SDK instead of Boto3) to create a Batch Transform job. There is no need to create an endpoint and you can continue to use your existing Model.
The input to the Batch Transformation job (which should be in S3) should have the contents organized as mentioned in the documentation you found out and content-type should be passed as application/jsonlines.
Here is one example (on a different algorithm though) on how to use Batch Transform using the same Model created for online inference (Batch Transform is at the end) : https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_blazingtext_dbpedia/inference_pipeline_sparkml_blazingtext_dbpedia.ipynb
Issue Description
The Seq2Seq example notebook has the following step "Using Protobuf format for inference (Suggested for efficient bulk inference)" with the following snippet
However, when I try running it, I get an error with the message
"ERROR:root:Unable to evaluate payload provided"
Details
In the cloud watch log stream, I see the following additional details in the trace:
On further investigation, I found that in the doc for Seq2Seq, it is mentioned that
For batch transform, inference supports JSON Lines format. Batch transform expects the input in JSON Lines format and returns the output in JSON Lines format. Both content and accept types should be application/jsonlines. The format for input is as follows:
And I noticed that in the example notebook the step used
ContentType='application/x-recordio-protobuf
for Batch transform.Am I missing something? Or should the example notebook be updated to use content type
application/jsonlines
?Thanks!