-
https://openaccess.thecvf.com/content/CVPR2022/papers/Xie_SimMIM_A_Simple_Framework_for_Masked_Image_Modeling_CVPR_2022_paper.pdf
-
Thank you for sharing the source code of VLMO recently.
We took a stab and pretrained a large (1024 hidden dim) multiway transformer with mim loss, mlm loss, and contrastive loss.
BEIT3 pret…
-
We would like to have an implementation of the following paper:
[Image Compression with Product Quantized Masked Image Modeling](https://arxiv.org/abs/2212.07372)
Alaaeldin El-Nouby, Matthew J. Mu…
-
# Vision Transformer Adapter for Dense Predictions
Info.
- ICLR 2023 spotlight
- https://github.com/czczup/ViT-Adapter
- https://arxiv.org/abs/2205.08534
### Summary
- plain ViT
- whi…
-
- https://arxiv.org/abs/2109.12178
- 2021
視覚と言語の事前学習(VLP)は,画像やテキストの入力を必要とする下流のタスクのモデル性能を向上させる.
現在のVLPアプローチは、
(i)モデルアーキテクチャ(特に画像エンベッダー)、
(ii)損失関数、
(iii)マスキングポリシーによって異なります。
画像エンベッダーは、ResNet…
e4exp updated
2 years ago
-
https://github.com/ParadoxZW/LLaVA-UHD-Better/blob/main/llava_uhd/adapt_llava.py#L136-L138
这里由于The first token is for CLS,是不是需要把
```python
m[:w * h] = True
```
改成
```python
m[:w * h+1] = …
-
Hello,
Thank you for your amazing work! I have some doubts when I am trying to train my own colmap dataset by SCGS.
Here's the thing:I want to model the whole scene(both dynamic and static) but not…
-
### Links
- Paper : https://arxiv.org/abs/2111.06377
- Github : https://github.com/facebookresearch/mae
### 한 줄 요약
- Self-supervised learning 중 masked image modeling 개념을 적용한 논문으로, NLP 도메인과의 차이점을…
-
The goal of this issue is to track new self-supervised methods and to which extent they are implemented in Lightly.
See also our [#papers channel](https://discord.com/channels/752876370337726585/81…
-
Hello! I notice in your code that the model's input remains consistent during training and inference, i.e., paired images `imgs`, paired labels `tgts`, and mask `bool_masked_pos`. During `forward()`, …