NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
973 stars 68 forks source link

Question about the output #77

Open DwanZhang-AI opened 2 weeks ago

DwanZhang-AI commented 2 weeks ago
image

This is the output of the model.

python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/Llama-3-VILA1.5-8b \ --conv-mode llama_3 \ --query "\n Please describe the traffic condition." \ --image-file "demo_images/av.png"

This is the inference code. Why the output is wrong?

DwanZhang-AI commented 2 weeks ago

BTW, I have muted the flash attention module.

Lyken17 commented 2 weeks ago

seems works properly on my side, should not be a issue for flash-attn. Could you make sure you have done a fresh install following readme?