The code does not match the pipeline in your paper

YifanXu74 / Evo-ViT

Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

MIT License

69 stars 5 forks source link

In the original paper, there is a special token named representative token, which is aggregated by the placeholder tokens. However, there is no corresponding implementation in your code.

In fact, you simply use argsort and select the topk informative tokens, which is non-differentiable.

# topk for slow update
x = x_[:, :N_ + 1] # L438
# simply copy for fast update
x = torch.cat((x, x_[:, N_ + 1:]), dim=1) # L473

I'm curious about the performance of using aggregating tokens and differentiable topk used in other paper. Hopefully for your reply.

YifanXu74 / Evo-ViT

The code does not match the pipeline in your paper #3