Open sutiankang opened 1 year ago
act_vision_transformer.py: pay attention to the [self.counter_token] state.
Only, (i) zeroing out the token value, and (ii) blocking its attention to other tokens, shielding its impact to tl¡Nk in Eqn. 2 But NOT ’remove‘?
Hi, Thanks for your excellent work. The paper mentions that the number of tokens will be changed in different Transformer layers, but when I debugged the code, I found that the number of tokens is the same in each layer. What is the reason for this?