cross-modality-pretraining Search Results

facebookresearch/av_hubert #100

AssertionError

AssertionError: Could not infer task type from {'_name': 'av_hubert_pretraining', 'is_s2s': True, 'data': '/checkpoint/bshi/data/lrs3//exp/ls-hubert/tune-modality/all_tsv/', 'label_dir': '/checkpoint/…

mabaisen updated 10 months ago

bigshanedogg/survey #18

[FILIP] FILIP: Fine-grained Interactive Language-Image Pre-T…

## Problem statement 1. CLIP variants의 이미지와 텍스트 사이의 관계 학습은 텍스트의 각 토큰들과 이미지 패치의 관계에 대해 학습하기에는 학습과 추론 시 효율성이 떨어진다 -> finer-level alignment할 수 있는 방법을 찾아보자 2. 이미지 패치와 텍스트 토큰 간의 attention 이용하는 기존 연구의 약점 …

bigshanedogg updated 2 years ago

huggingface/transformers #15813

Add OFA to transformers

# 🌟 New model addition We recently proposed OFA, a unified model for multimodal pretraining, which achieves multiple SoTAs on downstream tasks, including image captioning, text-to-image generation, r…

JustinLin610 updated 1 year ago

uhhyunjoo/paper-notes #11

[arXiv 2021] Cross Modal Retrieval with Querybank Normalisat…

||link| |----|---| |paper| [Cross Modal Retrieval with Querybank Normalisation](https://arxiv.org/pdf/2112.12777v3.pdf) | |code| [papers with code](https://paperswithcode.com/paper/cross-modal-retr…

uhhyunjoo updated 2 years ago

yyf17/NavigationProject #8

CVPR 2022

CVPR 2022 # 格式 * **Paper Title** *Author(s)* CVPR, 2022. [[Paper]](link) [[Code]](link) [[Website]](link) 需要填充： 1）Paper Title 2） Author(s) 3） 3个“link” 4）两篇文章之间间隔一行 # agent Meta Ag…

yyf17 updated 2 years ago

yyf17/awesome-embodied-intelligent #1

SoundSpace

# [sound-spaces](https://github.com/facebookresearch/sound-spaces) [Project: RLR-Audio-Propagation](https://github.com/facebookresearch/rlr-audio-propagation) [Audio Sensor](https://github.com/f…

yyf17 updated 2 years ago

chaos-moon/paper_daily #18

CLIP系列

## CLIP * [\[Blog\]](https://openai.com/blog/clip/) * [\[Paper\]](https://arxiv.org/abs/2103.00020) * [\[code\]](https://github.com/openai/CLIP) * [\[Model Card\]](https://github.com/openai/CL…

zc12345 updated 1 year ago

v-iashin/MDVC #11

Dense Video Captioning on raw input videos

It seems a nice work. I wanted to test it on custom input videos. It would be very helpful if you can provide a script for generating video captions for a raw input video.

harpavatkeerti updated 1 year ago

gitmylo/audio-webui #176

Installation Issue.

**Describe the bug** Audio-Webui does not install the requirements properly, precisely on audiolm, saying it failed to install. **To Reproduce** Steps to reproduce the behavior: 1. Go to 'audio-…

PericoSpart updated 3 months ago

XpressAI/xai-llm-server #2

Feature Request: Add support for Llama-3.2-11B-vision/

### Problem We want to add support for this new model that unlike the previous ones also supports vision. The readme for the model is described below: --- language: - en - de - fr - it - pt…

wmeddie updated 1 month ago

35 results for cross-modality-pretraining

35 results
for cross-modality-pretraining