-
Hi,
Impressive work! I want to ask how to extract features from my own video-text datasets for finetuning model?
-
The [`inference/distributed` directory](https://github.com/huggingface/accelerate/tree/main/examples/inference/distributed) houses examples on running distributed inference with `accelerate`:
* Phi…
-
### Feature request
BLIP and GIT are 2 recent additions in the library, providing state-of-the-art performance for tasks like image captioning and visual question answering (VQA). GIT is even capable…
-
**Is your feature request related to a problem? Please describe.**
I have been actively using this repository for multimodal training involving images and text. It has been incredibly helpful for my …
-
_governing epic: #408 migrate amheida content from standalone website to the research section of isaw web_
There are videos in the Amheida website content. General experience suggests that it's lik…
-
Would there be any value in incorporating a video feed using the user's camera? On one hand, it might be redundant considering that we're already captioning the audio. But I also wonder if having vide…
-
Suggestion from Kaltura: https://github.com/kaltura/nginx-vod-module/issues/1049#issuecomment-534418869
Add a new “role” enum to support:
- normal subtitle
- describe video subtitles
- forced …
-
## タイトル: LocoMotion:動きに焦点を当てた映像言語表現の学習
## リンク: https://arxiv.org/abs/2410.12018
## 概要:
本論文では、動きに焦点を当てた動画言語表現の獲得を目指します。既存の動画言語表現学習手法は、物体やシーンの識別で適切なキャプションを区別できる、空間重視のデータを使用しています。そこで本研究では、局所的な物体の動きの変…
-
The [WebRTC Next Version Use Cases doc](https://w3c.github.io/webrtc-nv-use-cases/) lists three use-cases under [Funny hats](https://w3c.github.io/webrtc-nv-use-cases/#funnyhats*) related to plain-tex…
-
# Summary
UTK would like their site to be WCAG 2.0 aa compliant and for videos to have closed captioning so their site can be accessible to more people and those with disabilities.
The accessibility…