how to use trt_llm to accelerate original llava-liuhaotian/llava-v1.5-7b?

ganliqiang commented 6 months ago

when i use the example in multimodel, i download the original model-liuhaotian/llava-v1.5-7b,but some error occur? llama = from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1164, in from_hugging_face config = create_config_from_hugging_face(model_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1070, in create_config_from_hugging_face architecture = hf_config.architectures[0] , i found the config.txt is different between the liuhaotian/llava-v1.5-7b and llava-hf/llava-1.5-13b-hf,so how to use the trt_llm to the original model?thanks in advanced。

### Tasks

ganliqiang commented 6 months ago

@byshiue @QiJune We are currently facing a pressing issue where our business model urgently requires improvements in performance from your work. Could you please let us know when you might be able to provide support for this issue? Alternatively, could you provide some guidance that would allow us to proceed with the work on our own? Thank you in advance for your response.

QiJune commented 6 months ago

@ganliqiang Could you please share your commands?

ganliqiang commented 6 months ago

i just follow the instruction, first, export MODEL_NAME="llava-v1.5-13b", replace the llava-1.5-13b-hf with the original model name llava-v1.5-13b, second, run the command python ../llama/convert_checkpoint.py \ --model_dir tmp/hf_models/${MODEL_NAME} \ --output_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu \ --dtype float16, the error occur:Traceback (most recent call last): File "/mnt/glq/trt_llm/TensorRT-LLM/examples/multimodal/../llama/convert_checkpoint.py", line 523, in main() File "/mnt/glq/trt_llm/TensorRT-LLM/examples/multimodal/../llama/convert_checkpoint.py", line 515, in main convert_and_save_hf(args) File "/mnt/glq/trt_llm/TensorRT-LLM/examples/multimodal/../llama/convert_checkpoint.py", line 434, in convert_and_save_hf llama = from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1164, in from_hugging_face config = create_config_from_hugging_face(model_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1070, in create_config_from_hugging_face architecture = hf_config.architectures[0] TypeError: 'NoneType' object is not subscriptable i guess the config.txt is differen between this two model.so the convert fail.i compare the two model arch is slightly different,so i do not know how to run the original llava model successfully.

zh0ngtian commented 6 months ago

try 0.7.1

irexyc commented 6 months ago

Hi, LMDeploy now support serving liuhaotian/llava-v1.5-7b and provides OpenAI-compatible APIs。Feedback is welcomed

litaotju commented 6 months ago

@ganliqiang could you use the hugging face checkpoint for this model? The hugging face model is supported and tested. TRT-LLM needs to reads the hf_config.architectures to make sure the TRT-LLM model class is used correctly.

xwqianbei commented 5 months ago

Have you solved the problem yet? @ganliqiang

ganliqiang commented 5 months ago

Have you solved the problem yet? @ganliqiang

yes ,but i do not use this framework,i switch to use llama.cpp.because the accuracy is a little low in my task.but the llama.cpp can maintain the same result.FYI,you can try too.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

NVIDIA / TensorRT-LLM

how to use trt_llm to accelerate original llava-liuhaotian/llava-v1.5-7b? #1298