(Question) LayerNorm for queries and keys

facebookresearch / ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.

Other

970 stars 69 forks source link

(Question) LayerNorm for queries and keys #25

Closed sanagno closed 1 year ago

sanagno commented 1 year ago

Is there a particular reason you removed the LayerNorm from the queries and the keys inside the Attention block?

This is the original implementation in timm https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer.py#L83.

dbolya commented 1 year ago

No particular reason. Those layernorms are not used in any of the models we support (i.e., always set to nn.Identity()) and weren't in the version of timm that we were developing with (0.4.12). Feel free to add them though if there's a model you want to use that has those layernorms.

sanagno commented 1 year ago

Thanks a lot!