Open LiangYuHeng66 opened 2 months ago
We refer to the previous work [1] to construct the Differentiable Decision Matrix by Gumbel softmax. Through extensive experiments, we have found the network does not bring significant benefits but causes training instability. Hence we did not integrate the network into our public codes. It does not affect the reproducibility of the experimental results.
[1] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification, NeurIPS 2021 (https://github.com/raoyongming/DynamicViT/blob/master/models/dylvvit.py#L512)
We also provide our previous codes:
class TokenSparse(nn.Module):
def __init__(self, embed_dim=512, sparse_ratio=0.6, attention_weight=0.8):
super().__init__()
self.sparse_ratio = sparse_ratio
self.attention_weight = attention_weight
# score network
self.score_net = nn.Sequential(
nn.LayerNorm(embed_dim),
nn.Linear(embed_dim, embed_dim // 4),
nn.GELU(),
nn.Linear(embed_dim // 4, 2),
nn.LogSoftmax(dim=-1),
)
def forward(self, x, attention_x=None, attention_y=None, gumbel=False, keepdim=False):
B_v, L_v, C = x.size()
# (B_v, L_v, 2)
score = self.score_net(x)
# add attention information
attention = (attention_x + attention_y) * 0.5
# values from [-1, 1] -> [0, 1]
# the probabilistic form
attention = (1 + attention) / 2
attention_prob = torch.stack([attention, 1-attention], dim=2)
# Add to the score predicted by the network
# (B_v, L_v, 2)
score = (1 - self.attention_weight) * score.exp() + self.attention_weight * attention_prob + 1e-8
# The original score is in the form of logP
score = torch.log(score)
# Gumbel-softmax trick
if gumbel:
# The original score is in the form of logP
# Calculate the probability logarithm according to the input of gumbel_softmax
# It is easy to have problems here, and nan will occur
if keepdim:
# (B_v, L_v, 1)
score_hard = F.gumbel_softmax(score, hard=True)[..., 0:1]
else:
# (B_v, L_v)
score_hard = F.gumbel_softmax(score, hard=True)[..., 0]
return score_hard, score[..., 0].exp()
# directly return the calculated probability value.
# (B_v, L_v, 2)
return score
好的,感谢您的回复!您公开的代码中有关Score Estimation部分,选择的是score = attention_x + attention_y,而不是score = (1 - self.attention_weight) score.exp() + self.attention_weight attention_prob + 1e-8,其原因是后者没有带来显着的好处?这样做会不会影响实验结果?期待您的再次回复,谢谢
好的,感谢您的回复!您公开的代码中有关Score Estimation部分,选择的是score = attention_x + attention_y,而不是score = (1 - self.attention_weight) score.exp() + self.attention_weight attention_prob + 1e-8,其原因是后者没有带来显着的好处?这样做会不会影响实验结果?期待您的再次回复,谢谢
可以这么认为,使用后者没有带来显著性能收益(论文Ablation Study),也存在训练过程中出现Nan的问题,因此使用前者对实验结果没有太大影响,我们也提供了训练日志、模型权重、超参数,能达到论文预期的性能。 https://github.com/CrossmodalGroup/LAPS?tab=readme-ov-file#performances
你也可以尝试把Differentiable Decision Matrix 和 gumbel softmax 这一策略加入你的研究工作中,但与直接使用attention weights分数作判别相比,可能会带来少许不稳定性。
好的,十分感谢!
您好,请问一下如果使用gumbal_softmax()如何保证选择预定义数量的重要patch呢?
您好,请问一下如果使用gumbal_softmax()如何保证选择预定义数量的重要patch呢?
[1] https://github.com/raoyongming/DynamicViT/blob/master/losses.py#L48
好的,十分感谢您的解答!
同问,感觉经过分数计算后,每个patch的得分并不是binary的