-
The HowTo100M + VidChapters-7M + ViTT model is performing poorly on dense video captioning.
Reproduction:
Run
```
yt-dlp -P $TRANSFORMERS_CACHE -o video.mp4 https://www.youtube.com/watch?v=WJ…
-
It seems a nice work. I wanted to test it on custom input videos. It would be very helpful if you can provide a script for generating video captions for a raw input video.
-
Hello, thank you for your work. I would like to ask why you think the task of synchronized subtitles is important. How can it help in action generation and action understanding?
-
How might EasyAnimate slice a 1080p video? Or more specifically what is the frame interval of which the slicing happens? Assuming this is the memory requirements for resolutions lower than 1080p.
E…
-
## 一言でいうと
Transformerベースのモデルで、End2Endのビデオキャプションを実現したという研究。Encoder側は動画中からキャプション対象のイベント(時間範囲)を抽出し、Decoder側はイベントにマスクをかけた上で文の生成を行なっていく。
![image](https://user-images.githubusercontent.com/544269/5370…
-
When will your group release the code and dataset of dense video object captioning?
-
Hello! Thank you so much for the contribution of this repo.
I'm so interested in this work, and I'm suveying papers with key words like "captioning anything" or "instance level captioning" or "per pi…
-
Find a suitable endcoder-decoder model and start traing the model with suitable datasets.
-
-
Hello.
I tried running the training example for vid2seq, but it fails with the error message `ModuleNotFoundError: No module named 'scenic.projects.vid2seq.metrics'`. I cannot locate the 'metrics' …