-
*Sent by Google Scholar Alerts (scholaralerts-noreply@google.com). Created by [fire](https://fire.fundersclub.com/).*
---
###
###
### [PDF] [Attention Prompting on Image for Large Vision-Language…
-
We propose SceneVerse, the first million-scale 3D vision-language dataset with 68K 3D indoor scenes and 2.5M vision-language pairs. SceneVerse contains 3D scenes curated from diverse existing datasets…
-
https://github.com/2U1/Qwen2-VL-Finetune
I made a code for who wants to use the huggingface version to finetune, and having difficult using some other frameworks like me.
This code only uses hug…
-
Hi dude, nice code base!
I have a few questions regarding the training time and want to double check with you.
I'm training the llama-3.2-vision-instruct-11B model on a customized dataset with fu…
-
- https://arxiv.org/abs/2109.01134
- 2021
視覚言語の事前学習は、近年、表現学習の有望な選択肢として浮上している。
これは、固定された重みのセットを学習するために画像と個別のラベルを使用するという伝統的な方法から、2つの別々のエンコーダのために画像と生のテキストを揃える方法へと移行します。
このようなパラダイムは、より広範な監視源の恩恵を受け、視…
e4exp updated
3 years ago
-
![image](https://github.com/YoojLee/paper_review/assets/52986798/1fd311e0-fa81-4b5c-a152-26fc4eb1e397)
## Summary
기존의 CLIP-like VLM의 prompt를 few labeled images로 tuning하여 transferability를 개선한 CoOp …
-
![image](https://github.com/YoojLee/paper_review/assets/52986798/5fd58c1e-e243-42b7-afa5-ea8c35404e15)
## Summary
VLM은 prompting을 통해 zero-shot transfer가 가능해진다는 장점이 있음. 그러나, 효과적인 zero-shot transfer…
-
https://arxiv.org/abs/2303.06571
-
### System Info
NA
### Who can help?
@muellerz @sunma
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks
- [ ] An officially supported task in the `examp…
zanqi updated
2 weeks ago
-
# Papers
- Sapiens: Foundation for Human Vision Models
- 메타에서 나온 Human foundation model ㄷㄷㄷ
- 2D pose estimation, body-part segmentation, depth prediction and normal prediction이 하나의 모델에서 …