Implementation details of view transformer

VITA-Group / GNT

[ICLR 2023] "Is Attention All NeRF Needs?" by Mukund Varma T*, Peihao Wang* , Xuxi Chen, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang

https://vita-group.github.io/GNT

MIT License

338 stars 24 forks source link

Implementation details of view transformer #12

Open Zhentao-Liu opened 1 year ago

Zhentao-Liu commented 1 year ago

In the provided code, attn = k - q[:,:,None,:] + pos, attn = self.attn_fc(attn). However, in Fig. 2.a and alg.1, there should not be self.attn_fc component. Could you give an explanation?

Zhentao-Liu commented 1 year ago

This part code is in transformer_network.py class Attention2D

Zhentao-Liu commented 1 year ago

In Eq 9, what do you mean by applying diag(.)

MukundVarmaT commented 1 year ago

Hi @Zhentao-Liu!

Thank you for pointing it out! Yes, there is an error in our pseudo-code in algorithm. 1 (although fa(.) was defined we never used it). However, our implementation details (in text) do discuss the same (Appendix. B - Memory-efficient Cross-View Attention).