gabeur / mmt

Multi-Modal Transformer for Video Retrieval
http://thoth.inrialpes.fr/research/MMT/
Apache License 2.0
259 stars 41 forks source link

Wrong dimension for L2 normalization? #1

Closed dzabraev closed 4 years ago

dzabraev commented 4 years ago

When ReduceDim

https://github.com/gabeur/mmt/blob/0d848cd0811dbcbee06a62cf474636e68d728944/model/model.py#L715-L725

is applied to video expert embeddings it do F.normalize on shape (batch_size, num_tokens, embedding_dim), and reduce by default along dim=1 (num_tokens)

https://github.com/gabeur/mmt/blob/0d848cd0811dbcbee06a62cf474636e68d728944/model/model.py#L431-L434

But here it is applied to shape (batch_size, embedding_dim)

https://github.com/gabeur/mmt/blob/0d848cd0811dbcbee06a62cf474636e68d728944/model/model.py#L423-L426

So normalization along embedding_dim axis.

Is it mistake or not?

gabeur commented 4 years ago

Good catch, thank you! Indeed it was a mistake. I have pushed a fix. Fixing the bug seems to improve performance slightly, not significantly.