Open lss15151161 opened 3 days ago
I saw similar results with llama3. Mine was resolved when I disabled 'use_custom_all_reduce' in compilation
Could you try the latest versoin TRT_LLM 0.11+ https://nvidia.github.io/TensorRT-LLM/installation/linux.html
System info
GPU: A100 tensorrt 9.3.0.post12.dev1 tensorrt-llm 0.9.0 torch 2.2.2
Reproduction
python build_visual_engine.py --model_path tmp/hf_models/${MODEL_NAME} --model_type llava # or "--model_type vila" for VILA
if I use the same data to form a batch,the result like this:
and if I use two different prompt to form a batch,the reslt like this:
The image used is : https://storage.googleapis.com/sfr-vision-language-research/LAVIS/assets/merlion.png