NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
2.01k stars 159 forks source link

Repetitive Output in LongViLa-LLama3-1024Frames #149

Open hb-jw opened 1 week ago

hb-jw commented 1 week ago

LongViLa-LLama3-1024Frames output is often repetitive. Why does this happen, and are there any suggestions to reduce the repetition?

yepzhang commented 1 week ago

Hi! I noticed that you’re working with LongViLa-LLama3-1024Frames. I’m also trying to run inference with long context but am encountering issues with multi-GPU usage—my model only runs on a single GPU. Have you found a way to successfully utilize multiple GPUs for long-context inference? Any insights or suggestions would be greatly appreciated!