This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).
Thanks for your great work!
I have a question about selecting tokens with maximum activation in Part Selection Module.
In Eq.6, is a_l^i the attention-score calculated separately for the class token and other N tokens? So the dimension of a_l^i is N right?
Thanks for your great work! I have a question about selecting tokens with maximum activation in Part Selection Module. In Eq.6, is a_l^i the attention-score calculated separately for the class token and other N tokens? So the dimension of a_l^i is N right?