-
AssertionError: Could not infer task type from {'_name': 'av_hubert_pretraining', 'is_s2s': True, 'data': '/checkpoint/bshi/data/lrs3//exp/ls-hubert/tune-modality/all_tsv/', 'label_dir': '/checkpoint/…
-
## Problem statement
1. CLIP variants의 이미지와 텍스트 사이의 관계 학습은 텍스트의 각 토큰들과 이미지 패치의 관계에 대해 학습하기에는 학습과 추론 시 효율성이 떨어진다 -> finer-level alignment할 수 있는 방법을 찾아보자
2. 이미지 패치와 텍스트 토큰 간의 attention 이용하는 기존 연구의 약점 …
-
# 🌟 New model addition
We recently proposed OFA, a unified model for multimodal pretraining, which achieves multiple SoTAs on downstream tasks, including image captioning, text-to-image generation, r…
-
||link|
|----|---|
|paper| [Cross Modal Retrieval with Querybank Normalisation](https://arxiv.org/pdf/2112.12777v3.pdf) |
|code| [papers with code](https://paperswithcode.com/paper/cross-modal-retr…
-
CVPR 2022
#
格式
* **Paper Title**
*Author(s)*
CVPR, 2022. [[Paper]](link) [[Code]](link) [[Website]](link)
需要填充:
1)Paper Title
2) Author(s)
3) 3个“link”
4)两篇文章之间间隔一行
# agent
Meta Ag…
yyf17 updated
2 years ago
-
#
[sound-spaces](https://github.com/facebookresearch/sound-spaces)
[Project: RLR-Audio-Propagation](https://github.com/facebookresearch/rlr-audio-propagation)
[Audio Sensor](https://github.com/f…
yyf17 updated
2 years ago
-
## CLIP
* [\[Blog\]](https://openai.com/blog/clip/)
* [\[Paper\]](https://arxiv.org/abs/2103.00020)
* [\[code\]](https://github.com/openai/CLIP)
* [\[Model Card\]](https://github.com/openai/CL…
-
It seems a nice work. I wanted to test it on custom input videos. It would be very helpful if you can provide a script for generating video captions for a raw input video.
-
**Describe the bug**
Audio-Webui does not install the requirements properly, precisely on audiolm, saying it failed to install.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to 'audio-…
-
### Problem
We want to add support for this new model that unlike the previous ones also supports vision. The readme for the model is described below:
---
language:
- en
- de
- fr
- it
- pt…