Open hb-jw opened 1 week ago
Hi! I noticed that you’re working with LongViLa-LLama3-1024Frames. I’m also trying to run inference with long context but am encountering issues with multi-GPU usage—my model only runs on a single GPU. Have you found a way to successfully utilize multiple GPUs for long-context inference? Any insights or suggestions would be greatly appreciated!
LongViLa-LLama3-1024Frames output is often repetitive. Why does this happen, and are there any suggestions to reduce the repetition?