-
So, I successfully trained a tiny yolov4 and it is working just fine. After training i tested on a pretty big video and the results were amazing but i have a question about the resolutions.
My mode…
-
hi,
glad to see the project,
wonder whether include action recognition or understanding ?
thanks
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
Congrats on adding support for video understanding to VILA, looks super cool!
Just curious, is there an updated or new paper with more technical details on how improved video understanding was adde…
-
I have the following setup:
- Nextcloud Server, Signaling Server and Recording Server as LXD containers in the same physical server
- Communications between services are proxied via an HAProxy con…
-
As a result of PR #3642 clarifying use of visible text in video-only time-based media, we receive the following [supported comment](https://github.com/w3c/wcag/issues/3642#issuecomment-2005349689):
…
-
Rencently, many MLLM works on both image and video understanding achieve great results on video benchmarks. e.g. LLaVA-Next, InternLM, Vila, etc
I think these works should also be added to the paper …
-
Hello team,
Thank you for this great work for video evaluation, could you add my new benchmark to the evaluation benchmarks
[[Project Page](https://vision-cair.github.io/InfiniBench/)] [[Code](htt…
-
Hi Authors,
Thanks for providing this good work! I am curious about why the model weights for gen and understanding are seperated, is there any plan for releasing one weights that is capable of bot…
-
Hi, this is excellent work! I have a question.
I’d like to know why the model was split into two. Can EMU3-Gen still maintain the same comprehension performance as EMU3-Chat?