Support for Hugginface multimodal models

vincentclaes commented 11 months ago

Describe the feature you'd like

Being able to deploy huggingface multimodal models to a sagemaker endpoint. Currently only language models are supported that require a prompt as input. Multimodal models like Llava / CLIP / ... require a prompt and an image as input and this is currently not supported.

How would this feature be used? Please describe.

This is how the feature will be used by the end user:

import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel

huggingface_model = HuggingFaceModel(
   model_data="some-repo/some-multimodal-model",   # <<----- specify the model
   role=sagemaker.get_execution_role()
   transformers_version="4.28.1", 
   pytorch_version="2.0.0",      
   py_version='py310',        
   model_server_workers=1
)

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.12xlarge",
    container_startup_health_check_timeout=900, # increase timeout for large models
    model_data_download_timeout=900, # increase timeout for large models
)
----------------!
### Call Llava
import base64
import requests

# request
data = {
    "image" : "some base64 encoded image", # <---- specify the image
    "question" : "Describe the image and color details.",
}
output = predictor.predict(data)
print(output)

Describe alternatives you've considered

You can package the model yourself and provide an inference.py script, but you have to download the model and tar.gz which takes a lot of time.

Additional context

I came up with this idea when I created a tar.gz for llava with an inference.py and made it available to the world. See my LinkedIn post here: https://www.linkedin.com/posts/vincent-claes-0b346337_aws-sagemaker-huggingface-activity-7141776348963885056-Uv0g?utm_source=share&utm_medium=member_desktop

vincentclaes commented 11 months ago

@mohanasudhan Is it ok if I have a look at this issue and propose a PR? If you have any existing code, that can be used as inspiration for me let me know!

mohanasudhan commented 11 months ago

@vincentclaes Please feel free to propose a PR. I don't have a specific code sample; however, consider looking at the new ModelBuilder class that has been created to simplify the deployment.

https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-modelbuilder-creation.html

mohanasudhan commented 11 months ago

More details in the notebook - https://aws.amazon.com/blogs/machine-learning/package-and-deploy-classical-ml-and-llms-easily-with-amazon-sagemaker-part-1-pysdk-improvements/

aws / sagemaker-python-sdk

Support for Hugginface multimodal models #4330