-
The following video ( [The spelled-out intro to language modeling: building makemore](https://www.youtube.com/watch?v=PaCmpygFfXo) ) is a bigram algorithm ML self-learning guidance video by Andrej Kar…
-
I'm getting poor transcription results using whisperx, specifically I am sometimes not getting any transcription out of some short videos, whereas OpenAI's official whisper model transcribes them corr…
reasv updated
20 minutes ago
-
Hi there!
Thanks for the effort to maintain this amazing repository.
This is a request to add our recent work on evaluation of Video Models. We propose an evaluation benchmark, _VELOCITI_.
Plea…
-
Can your model be fed with multiple images at once, such as different frames of a video? Or can it be modified so that the input to the language model is the tokens of multiple images at once?
-
link text: [VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs](https://arxiv.org/pdf/2306.02858)
Actual title of the paper: Video-LLaMA An Instruction-tuned Audi…
-
Hi,
Thank you for your outstanding work! Without a doubt, your recently published VILA v1.5 series pushes the boundaries of multimodal large language models. It is arguably the most powerful and us…
-
For the creation of a lip-reading dataset in SignWriting, we need to map IPA symbols to SignWriting.
The project will go like this:
1. Collect sign language videos with a known language (e.g. Engl…
-
### What is the issue?
I updated to llama3, I use the SubtitleEdit program to transcribe other languages, the program translates correctly, but when I want to transcribe from English to Spanish it …
-
Currently the app only supports search in documents. Expecting OCR support for images.
-
**What I need help with / What I was wondering**
I want to load a dataset containing these
![image](https://github.com/tensorflow/datasets/assets/122366389/0213f11e-c48f-4bb8-bfa1-2433fefd0cb3)
w…