Failed to inference single image using xtuner chat with llava-llama3-8b model

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

https://xtuner.readthedocs.io/zh-cn/latest/

Apache License 2.0

4k stars 314 forks source link

Failed to inference single image using xtuner chat with llava-llama3-8b model #927

Closed J0eky closed 2 months ago

J0eky commented 2 months ago

How can I input a image correctly?

J0eky commented 2 months ago

I find the solution from https://www.modelscope.cn/models/xtuner/llava-llama-3-8b-v1_1, and follow the command: xtuner chat xtuner/llava-llama-3-8b-v1_1 \ --visual-encoder openai/clip-vit-large-patch14-336 \ --llava xtuner/llava-llama-3-8b-v1_1 \ --prompt-template llama3_chat \ --image $IMAGE_PATH and then can run successfully.