vision-language-pretraining Search Results

172 results
for vision-language-pretraining

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

turingmotors/heron #34

Regarding training your own model

Hello! Thank you for your great work, I have following question : I have my own Elyza7B checkpoint that I want to finetune on VQA task. If I follow the llava training scheme closely, I think we n…

Aniketto16 updated 4 months ago
1
training-transformers-together/training-transformers-together.github.io #2

[Section] Demo introduction

https://github.com/training-transformers-together/hf-website-how-to-join Demo page (updated on push): https://training-transformers-together.github.io/ - [x] intro and motivation text - [x] liv…

justheuristic updated 2 years ago
2
OFA-Sys/OFA #378

How to set the length of generated tokens?

Dear coauthors, - In pretraining/finetuning stage, for vision-language task (especially for visual_grounding and caption), can I set the length of generated tokens? Because I want a longer generated …

xcvil updated 1 year ago
1
facebookresearch/ImageBind #81

Splitting VISION into text and video

Hi, I have worked out using the different data loaders to compare images and video but cannot work out a way to do both simultaneously. It seems using ModalityType.VISION, you can only load and t…

kilowrk updated 9 months ago
1
yyf17/NavigationProject #8

CVPR 2022

CVPR 2022 # 格式 * **Paper Title** *Author(s)* CVPR, 2022. [[Paper]](link) [[Code]](link) [[Website]](link) 需要填充： 1）Paper Title 2） Author(s) 3） 3个“link” 4）两篇文章之间间隔一行 # agent Meta Ag…

yyf17 updated 1 year ago
1
bigshanedogg/survey #21

[FROZEN] Multimodal Few-Shot Learning with Frozen Language M…

## Problem statement 1. Despite the impressive capabilities of large scale language models, the potential to modalities has not been fully demonstrated other than text. 2. Aligning parameters of vi…

bigshanedogg updated 1 year ago
1
peteanderson80/Matterport3DSimulator #110

The panoramic view is not stiched well (first and last image…

The generated panoramic view from the skybox images is not stiched well by runing the downsize_skybox.py, that is the _skybox_small.jpg is not an accurate panoramic view especially for the first and l…

lukewenMX updated 1 month ago
5
yiren-jian/BLIText #5

Stage 0 scripts and config

Hi there, Great work! Could you please provide us with the `pretrain_stage0.sh` or the config file (except the log file). We would like to reproduce some experiments! Thank You!

Sreyan88 updated 4 months ago
1
scene-verse/SceneVerse #14

Details of point cloud alignment and how to bring in custom …

Hi authors, Great work! Can you please share more details about the pointcloud alignment mentioned in your paper below: > To ensure cohesion across various sources, we conduct preprocessing ste…

zubair-irshad updated 21 hours ago
5
yyf17/awesome-embodied-intelligent #1

SoundSpace

# [sound-spaces](https://github.com/facebookresearch/sound-spaces) [Project: RLR-Audio-Propagation](https://github.com/facebookresearch/rlr-audio-propagation) [Audio Sensor](https://github.com/f…

yyf17 updated 1 year ago
1

上一页 1...1 2 3 4 5 6 7...18 下一页

172 results for vision-language-pretraining

172 results
for vision-language-pretraining