Efficient-Large-Model / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
878 stars 55 forks source link

Question about the output #77

Open DwanZhang-AI opened 1 week ago

DwanZhang-AI commented 1 week ago
image

This is the output of the model.

python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/Llama-3-VILA1.5-8b \ --conv-mode llama_3 \ --query "\n Please describe the traffic condition." \ --image-file "demo_images/av.png"

This is the inference code. Why the output is wrong?

DwanZhang-AI commented 1 week ago

BTW, I have muted the flash attention module.

Lyken17 commented 1 week ago

seems works properly on my side, should not be a issue for flash-attn. Could you make sure you have done a fresh install following readme?