fudan-zvg / SETR

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
MIT License
1.05k stars 150 forks source link

Questions about "cls_token" and "pos_embed" in the code #51

Closed wscc123 closed 2 years ago

wscc123 commented 2 years ago

The "cls_token" and "pos_embed" are defined as all-zero matrices, what is the meaning? How is it applied in the model later? I am not doing this direction, just want to learn from your work, but also hope that you can help me answer!

"self.cls_token = nn.Parameter(torch.zeros(1, 1, self.embed_dim)) self.pos_embed = nn.Parameter(torch.zeros( 1, self.num_patches + 1, self.embed_dim)) "

sixiaozheng commented 2 years ago

The "cls_token" and "pos_embed" are initialized by Line 339 trunc_normal_(self.pos_embed, std=.02) and Line 340 trunc_normal_(self.cls_token, std=.02). https://github.com/fudan-zvg/SETR/blob/2f6d854bde6f69b6aaecdbf5d05696b585b65233/mmseg/models/backbones/vit.py#L339 https://github.com/fudan-zvg/SETR/blob/2f6d854bde6f69b6aaecdbf5d05696b585b65233/mmseg/models/backbones/vit.py#L340

wscc123 commented 2 years ago

The "cls_token" and "pos_embed" are initialized by Line 339 trunc_normal_(self.pos_embed, std=.02) and Line 340 trunc_normal_(self.cls_token, std=.02).

https://github.com/fudan-zvg/SETR/blob/2f6d854bde6f69b6aaecdbf5d05696b585b65233/mmseg/models/backbones/vit.py#L339

https://github.com/fudan-zvg/SETR/blob/2f6d854bde6f69b6aaecdbf5d05696b585b65233/mmseg/models/backbones/vit.py#L340

I can only see that after ’cls_token‘ and ’pos_embed‘ are initialized, they are combined with ’x‘ into the model. Is this the case? Is it adjusting parameters with the model? I would like to have an in-depth understanding of the specific functions of 'class' and 'position'. I hope I could get your help, thanks a lot!

sixiaozheng commented 2 years ago

self.cls_token:class token self.pos_embed : position embedding

Because self.cls_token and self.pos_embed are nn.Parameter, they are learnable parameters and can be updated by the optimizer. self.cls_token is the output of the model and represents the classification probability. SETR is based on VIT, so the code of VIT code is used directly, but self.cls_token is not actually used. self.pos_embed is part of the transformer and VIT, and VIT trains self.pos_embed as a learnable parameter. It is recommended to refer to the papers of transformer and VIT.

wscc123 commented 2 years ago

self.cls_token:class token self.pos_embed : position embedding

Because self.cls_token and self.pos_embed are nn.Parameter, they are learnable parameters and can be updated by the optimizer. self.cls_token is the output of the model and represents the classification probability. SETR is based on VIT, so the code of VIT code is used directly, but self.cls_token is not actually used. self.pos_embed is part of the transformer and VIT, and VIT trains self.pos_embed as a learnable parameter. It is recommended to refer to the papers of transformer and VIT. I understand, thanks a lot, Research goes well!

liuxingyu123 commented 2 years ago

self.cls_token:class token self.pos_embed : position embedding

Because self.cls_token and self.pos_embed are nn.Parameter, they are learnable parameters and can be updated by the optimizer. self.cls_token is the output of the model and represents the classification probability. SETR is based on VIT, so the code of VIT code is used directly, but self.cls_token is not actually used. self.pos_embed is part of the transformer and VIT, and VIT trains self.pos_embed as a learnable parameter. It is recommended to refer to the papers of transformer and VIT.

Hello,but i found my cls token and pos_embed parameter have not changed,is the reason of @property def no_weight_decay(self): return {'pos_embed', 'cls_token'} I haven't found the problem for a long time, thank you for your reply.

wscc123 commented 2 years ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

VictorLlu commented 2 years ago

self.cls_token:class token self.pos_embed : position embedding Because self.cls_token and self.pos_embed are nn.Parameter, they are learnable parameters and can be updated by the optimizer. self.cls_token is the output of the model and represents the classification probability. SETR is based on VIT, so the code of VIT code is used directly, but self.cls_token is not actually used. self.pos_embed is part of the transformer and VIT, and VIT trains self.pos_embed as a learnable parameter. It is recommended to refer to the papers of transformer and VIT.

Hello,but i found my cls token and pos_embed parameter have not changed,is the reason of @property def no_weight_decay(self): return {'pos_embed', 'cls_token'} I haven't found the problem for a long time, thank you for your reply.

please make sure that requires_grad of cls_token and pos_embed are set as True and both appears in "param" of your optimizer

liuxingyu123 commented 2 years ago

thank you for your reply, the requires_grad of cls_token and pos_embed is true, but the grad in cls_token is 0 and the grad of cls_token corresponding position code is also 0 image image do you know why?

VictorLlu commented 2 years ago

thank you for your reply, the requires_grad of cls_token and pos_embed is true, but the grad in cls_token is 0 and the grad of cls_token corresponding position code is also 0 image image do you know why?

cls_token is not used in SETR at all. Pls forget it.

liuxingyu123 commented 2 years ago

thank you for your reply, the requires_grad of cls_token and pos_embed is true, but the grad in cls_token is 0 and the grad of cls_token corresponding position code is also 0 image image do you know why?

cls_token is not used in SETR at all. Pls forget it.

ok, I understand, thank you very much.