-
Hi Tianhong, thank you for your inspiring work! While reading the paper, I had some questions regarding the term “MAR.” Aside from the difference mentioned in the paper—where the next set of tokens in…
-
Hi! It's an intriguing work!
I have a question about K400 pre-trained models.
Did you use ImageNet-1K pre-trained models to pre-train K400 datasets for downstream tasks (e.g., Breakfast, COIN, etc) …
-
https://openaccess.thecvf.com/content/CVPR2022/papers/Xie_SimMIM_A_Simple_Framework_for_Masked_Image_Modeling_CVPR_2022_paper.pdf
-
Thank you for sharing the source code of VLMO recently.
We took a stab and pretrained a large (1024 hidden dim) multiway transformer with mim loss, mlm loss, and contrastive loss.
BEIT3 pret…
-
When building models for teacher and student in this [code](https://github.com/facebookresearch/dinov2/blob/main/dinov2/models/__init__.py#L15), the parameter args.arch is used for both student and te…
-
The goal of this issue is to track new self-supervised methods and to which extent they are implemented in Lightly.
See also our [#papers channel](https://discord.com/channels/752876370337726585/81…
-
We would like to have an implementation of the following paper:
[Image Compression with Product Quantized Masked Image Modeling](https://arxiv.org/abs/2212.07372)
Alaaeldin El-Nouby, Matthew J. Mu…
-
您写的论文质量非常高,我有应该问题想请教一下。您这篇文章的GPU的内存节省方法是否可以移植到LoFormer中呢
-
- https://arxiv.org/abs/2109.12178
- 2021
視覚と言語の事前学習(VLP)は,画像やテキストの入力を必要とする下流のタスクのモデル性能を向上させる.
現在のVLPアプローチは、
(i)モデルアーキテクチャ(特に画像エンベッダー)、
(ii)損失関数、
(iii)マスキングポリシーによって異なります。
画像エンベッダーは、ResNet…
e4exp updated
3 years ago
-