Thanks for the very cool work! I was wondering... Is there a way to visualize Global Temporal Features of the Global MHRA in UniformerV2 like you did for UniformerV1?
I get that you reduce computational costs by doing cross-attention between class token and spatio-temporal tokens. Since this cross-attention happens on clones of the features, and since this cross-attention outputs a new class tokem, how are the global temporal features fused with the remaining features...
What I would like to do is have access to the spatial features (without the temporal features) and access to the combined spatio-temporal features...
Thanks for the very cool work! I was wondering... Is there a way to visualize Global Temporal Features of the Global MHRA in UniformerV2 like you did for UniformerV1?
I get that you reduce computational costs by doing cross-attention between class token and spatio-temporal tokens. Since this cross-attention happens on clones of the features, and since this cross-attention outputs a new class tokem, how are the global temporal features fused with the remaining features...
What I would like to do is have access to the spatial features (without the temporal features) and access to the combined spatio-temporal features...