Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error

NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Apache License 2.0

1.73k stars 134 forks source link

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error #94

Open averypfeiffer opened 1 month ago

averypfeiffer commented 1 month ago

When attempting to manually deploy the model to sagemaker via a deployment script or automatically deploying the model via the huggingface inference endpoints UI, I receive the same error:

"ValueError: The checkpoint you are trying to load has model type llava_llama but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date."

Lyken17 commented 1 month ago

unfortunatley we do not have sagemaker experts in our team. Could you check with AWS team for more details? Or share a scripts that can reproduce the error locally?

averypfeiffer commented 1 month ago

Absolutely! I don't believe its a sagemaker issue, it seems like a lack of support for the custom config llava_llama in the transformers library.

Here is a simple script that will immediately produce the issue when trying to load the model via the hugging face transformers library:


from PIL import Image
from transformers import pipeline

vqa_pipeline = pipeline(
    "visual-question-answering", model="Efficient-Large-Model/VILA1.5-40b"
)

# load an example image
image = Image.open("./test_images/einsidtoJYc-Scene-6-01.jpg")

# example text input
text = "What is happening in this image?"

# Prepare the payload
payload = {
    "inputs": {
        "question": text,
        "image": image
    }
}

result = vqa_pipeline(image, text, top_k=1)

print(f"Question: {text}")
print(f"Answer: {result[0]['answer']}")

Lyken17 commented 1 month ago

i think the problem is that we haven't tested with vqa-pipeline yet. Could you check with our offical inference impl?

JBurtn commented 1 month ago

Even simpler example.

from transformers import AutoConfig

model_id = "Efficient-Large-Model/VILA1.5-40b"
config = AutoConfig.from_pretrained(model_id,  trust_remote_code=True)# Error Here
print(config)

JBurtn commented 1 month ago

I copied what I needed from run_vila.py and it worked. if you do

from VILA.llava.model import *

it should fix the llava_llama issue. It still complains about missing weights (even with use_safetensors=False) if you try AWQ versions though.