Open Nastu-Ho opened 1 week ago
Is the vision encoder used here umt-l or internvideo2-1B? I saw that the mistral version in internvideo2 had similar results to the one here
Hi! We released UMT-L since it runs faster.
Is the vision encoder used here umt-l or internvideo2-1B? I saw that the mistral version in internvideo2 had similar results to the one here