aws / sagemaker-huggingface-inference-toolkit

Apache License 2.0
240 stars 60 forks source link

How to enable Batch inference on AWS deployed Serverless model from Hub? #98

Open jmparejaz opened 1 year ago

jmparejaz commented 1 year ago

I am using the serverless inference from Sagemaker with Huggingface Model from the hub according this example : https://github.com/huggingface/notebooks/blob/main/sagemaker/19_serverless_inference/sagemaker-notebook.ipynb

using the

image uri

image_container=get_huggingface_llm_image_uri("huggingface",version="0.9.3")

I was expecting the resulting pipeline to execute as the Pipeline class from transformers for this task (text generation) however, the input does not work with list.

Is there any approach to do batch inference on Sagemaker SDK?

philschmid commented 1 year ago

Hello @jmparejaz,

The input schema for the LLM container should be the same with {"inputs":"text", "parameters": {}} what issue are you seeing. The only difference here is that the LLM container has additional/different parameter, see here: https://huggingface.co/blog/sagemaker-huggingface-llm#4-run-inference-and-chat-with-our-model