Closed wscc123 closed 2 years ago
The "cls_token" and "pos_embed" are initialized by Line 339 trunc_normal_(self.pos_embed, std=.02)
and Line 340 trunc_normal_(self.cls_token, std=.02)
.
https://github.com/fudan-zvg/SETR/blob/2f6d854bde6f69b6aaecdbf5d05696b585b65233/mmseg/models/backbones/vit.py#L339
https://github.com/fudan-zvg/SETR/blob/2f6d854bde6f69b6aaecdbf5d05696b585b65233/mmseg/models/backbones/vit.py#L340
The "cls_token" and "pos_embed" are initialized by Line 339
trunc_normal_(self.pos_embed, std=.02)
and Line 340trunc_normal_(self.cls_token, std=.02)
.
I can only see that after ’cls_token‘ and ’pos_embed‘ are initialized, they are combined with ’x‘ into the model. Is this the case? Is it adjusting parameters with the model? I would like to have an in-depth understanding of the specific functions of 'class' and 'position'. I hope I could get your help, thanks a lot!
self.cls_token:class token self.pos_embed : position embedding
Because self.cls_token and self.pos_embed are nn.Parameter, they are learnable parameters and can be updated by the optimizer. self.cls_token is the output of the model and represents the classification probability. SETR is based on VIT, so the code of VIT code is used directly, but self.cls_token is not actually used. self.pos_embed is part of the transformer and VIT, and VIT trains self.pos_embed as a learnable parameter. It is recommended to refer to the papers of transformer and VIT.
self.cls_token:class token self.pos_embed : position embedding
Because self.cls_token and self.pos_embed are nn.Parameter, they are learnable parameters and can be updated by the optimizer. self.cls_token is the output of the model and represents the classification probability. SETR is based on VIT, so the code of VIT code is used directly, but self.cls_token is not actually used. self.pos_embed is part of the transformer and VIT, and VIT trains self.pos_embed as a learnable parameter. It is recommended to refer to the papers of transformer and VIT. I understand, thanks a lot, Research goes well!
self.cls_token:class token self.pos_embed : position embedding
Because self.cls_token and self.pos_embed are nn.Parameter, they are learnable parameters and can be updated by the optimizer. self.cls_token is the output of the model and represents the classification probability. SETR is based on VIT, so the code of VIT code is used directly, but self.cls_token is not actually used. self.pos_embed is part of the transformer and VIT, and VIT trains self.pos_embed as a learnable parameter. It is recommended to refer to the papers of transformer and VIT.
Hello,but i found my cls token and pos_embed parameter have not changed,is the reason of
@property def no_weight_decay(self): return {'pos_embed', 'cls_token'}
I haven't found the problem for a long time, thank you for your reply.
这是来自QQ邮箱的假期自动回复邮件。 您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
self.cls_token:class token self.pos_embed : position embedding Because self.cls_token and self.pos_embed are nn.Parameter, they are learnable parameters and can be updated by the optimizer. self.cls_token is the output of the model and represents the classification probability. SETR is based on VIT, so the code of VIT code is used directly, but self.cls_token is not actually used. self.pos_embed is part of the transformer and VIT, and VIT trains self.pos_embed as a learnable parameter. It is recommended to refer to the papers of transformer and VIT.
Hello,but i found my cls token and pos_embed parameter have not changed,is the reason of
@property def no_weight_decay(self): return {'pos_embed', 'cls_token'}
I haven't found the problem for a long time, thank you for your reply.
please make sure that requires_grad of cls_token and pos_embed are set as True and both appears in "param" of your optimizer
thank you for your reply, the requires_grad of cls_token and pos_embed is true, but the grad in cls_token is 0 and the grad of cls_token corresponding position code is also 0 do you know why?
thank you for your reply, the requires_grad of cls_token and pos_embed is true, but the grad in cls_token is 0 and the grad of cls_token corresponding position code is also 0 do you know why?
cls_token is not used in SETR at all. Pls forget it.
thank you for your reply, the requires_grad of cls_token and pos_embed is true, but the grad in cls_token is 0 and the grad of cls_token corresponding position code is also 0 do you know why?
cls_token is not used in SETR at all. Pls forget it.
ok, I understand, thank you very much.
The "cls_token" and "pos_embed" are defined as all-zero matrices, what is the meaning? How is it applied in the model later? I am not doing this direction, just want to learn from your work, but also hope that you can help me answer!
"self.cls_token = nn.Parameter(torch.zeros(1, 1, self.embed_dim)) self.pos_embed = nn.Parameter(torch.zeros( 1, self.num_patches + 1, self.embed_dim)) "