-
Hello!
Thank you for your great work, I have following question :
I have my own Elyza7B checkpoint that I want to finetune on VQA task. If I follow the llava training scheme closely, I think we n…
-
https://github.com/training-transformers-together/hf-website-how-to-join
Demo page (updated on push): https://training-transformers-together.github.io/
- [x] intro and motivation text
- [x] liv…
-
Dear coauthors,
- In pretraining/finetuning stage, for vision-language task (especially for visual_grounding and caption), can I set the length of generated tokens? Because I want a longer generated …
-
Hi,
I have worked out using the different data loaders to compare images and video but cannot work out a way to do both simultaneously. It seems using ModalityType.VISION, you can only load and t…
-
CVPR 2022
#
格式
* **Paper Title**
*Author(s)*
CVPR, 2022. [[Paper]](link) [[Code]](link) [[Website]](link)
需要填充:
1)Paper Title
2) Author(s)
3) 3个“link”
4)两篇文章之间间隔一行
# agent
Meta Ag…
-
## Problem statement
1. Despite the impressive capabilities of large scale language models, the potential to modalities has not been fully demonstrated other than text.
2. Aligning parameters of vi…
-
The generated panoramic view from the skybox images is not stiched well by runing the downsize_skybox.py, that is the _skybox_small.jpg is not an accurate panoramic view especially for the first and l…
-
Hi there,
Great work! Could you please provide us with the `pretrain_stage0.sh` or the config file (except the log file). We would like to reproduce some experiments! Thank You!
-
Hi authors,
Great work! Can you please share more details about the pointcloud alignment mentioned in your paper below:
> To ensure cohesion across various sources, we conduct preprocessing ste…
-
#
[sound-spaces](https://github.com/facebookresearch/sound-spaces)
[Project: RLR-Audio-Propagation](https://github.com/facebookresearch/rlr-audio-propagation)
[Audio Sensor](https://github.com/f…