OliverRensu / ARM

This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision
53 stars 1 forks source link

Why the code has the TODO: release this comment? #5

Open Fengxiang23 opened 1 month ago

Fengxiang23 commented 1 month ago

Thanks for your awesome works. BreadcrumbsARM/Finetuning /models_mamba.py 288 line. There is a TODO in the code: release this comment. Is the code not yet finished? Why can't the finetune test code be reproduced? Have you open-sourced all the finetune parts? image

OliverRensu commented 1 month ago

Hi, we inhrent the TODO from Vim (https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py#L312), but it does not affect the training or inference. Do you install the correct package like causal-conv1d==1.1.2.post1 & mamba-ssm 1.1.1 from ~/Mamba/Vim/mamba-1p1p1?

OliverRensu commented 1 month ago

We can also provide the finetuning log (fine-tuning with the code in this repo without any modification) if it helps you double-check/verify your fine-tuning process.

Fengxiang23 commented 1 month ago

Thank you for your patience. After comparing the original code, I found that this to do list is indeed original. But in the process of further checking the code, I found that ARM uses Deocder, and the Query in Decoder is a random variable constructed by self.ar_token = nn.Parameter(torch.zeros(1, 1, self.dec_embed_dim)). I observed that ARM wants to use the value calculated by crossattention with this Query to further calculate the loss. My question is, is this calculation method related to autoregressive training? How can we understand this implementation with the autoregressive training? Please forgive me for still having such doubts after reading your article. I really hope to get your explanation. Thanks again.

Fengxiang23 commented 1 month ago

In fact, count+=1 is used repeatedly in your Deocder part. This is equivalent to using the previous query in each Deocder block. I don't understand whether such calculation will bring additional benefits?