-
### System Info
transformers==4.45.2
when preparing the cross_attention_mask in ```_prepare_cross_attention_mask``` function we get the``` cross_attn_mask``` to the shape of [batch,text_tokens,i…
-
### System Info
PyTorch version: 2.6.0.dev20241101+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ub…
-
https://openaccess.thecvf.com/content/CVPR2022/papers/Xie_SimMIM_A_Simple_Framework_for_Masked_Image_Modeling_CVPR_2022_paper.pdf
-
1.Revitalizing optimization for 3d human pose and shape estimation: A sparse constrained formulation(2021)
code:No
2.Body meshes as points(2021)
regared as a two class classification task(if a grid…
-
Thank you for sharing the source code of VLMO recently.
We took a stab and pretrained a large (1024 hidden dim) multiway transformer with mim loss, mlm loss, and contrastive loss.
BEIT3 pret…
-
使用最新的transformers 4.47.0.dev0
删除 improt _expand_mask 改为自定义
def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
"""
Expands attention_mask from `[bs…
-
Hi author, thanks for the great work and the general concept about 'masked autoregressive'. When I try to use the model provided, I find that during training, the mask is generated randomly without an…
YOU-k updated
1 month ago
-
Hi Tianhong, thank you for your inspiring work! While reading the paper, I had some questions regarding the term “MAR.” Aside from the difference mentioned in the paper—where the next set of tokens in…
-
1.(HMMR) Learning 3d human dynamics from video(2019)
temporal encoder: **1D temporal** convolutional layers, **precompute** the image features on each frame, get current and ±∆t frames prediction.
c…
-
We would like to have an implementation of the following paper:
[Image Compression with Product Quantized Masked Image Modeling](https://arxiv.org/abs/2212.07372)
Alaaeldin El-Nouby, Matthew J. Mu…