-
Hey all!
The video models are all supported in Transformers now and will be part of the v4.42 release. Feel free to check out the model checkpoints [here](https://huggingface.co/collections/llava-h…
-
Hello everyone,
I have been working on replicating benchmarks related to video-class Large Language Models (LLMs), and I've noticed that most of these benchmarks rely on the GPT-assistant framework…
hb-jw updated
1 month ago
-
### Bug Report
After changing the language of the UI,
on the LocalDocs page,
the date format in -all- collections' descriptions does not change consistently and immediately as to reflect the ne…
-
Hi, after I run the command
`video_folder=visualization/videos
output_folder=visualization/output
pdvc_model_path=save/anet_tsp_pdvc/model-best.pth
output_language=en
bash test_and_visualize.sh…
-
_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._
---
### 1. Issue…
-
Hello!
Thanks for providing the hope about using the Thai language inference with better accuracy.
I have tried the following methods but none could give any meaningful words compared to the exist…
-
First of all, thank you for this package. It really one of _"What you never thought you needed but you needed"_.
I just wanted to suggest some improvements on the package if you have the time.
W…
-
I tested some videos
if the silence duration is long , then enable vad_filter will be effective
but if video is as normal, then enable vad_filter may cause more timestamp mismatch
is there …
-
Hi,
Thank you for your outstanding work! Without a doubt, your recently published VILA v1.5 series pushes the boundaries of multimodal large language models. It is arguably the most powerful and us…
-
Topic 1: This API feels like would suffer from conflation of intent
- Is the API for developers to detect language support?
- Or is it for developers to trigger the download of another model for la…