Insight about tiny U-net structure for K-embedding

FangShancheng / ABINet

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Other

437 stars 73 forks source link

Open mandal4 opened 2 years ago

mandal4 commented 2 years ago

Thanks for nice paper and the source codes. And i have some question about the codes which stands for the VM.

The purpose for the U-net structure in K-embedding? (self.k_encoder and self.k_decoder in class PositionAttention)
The purpose for the projection function for positional encoding (self.project in class PositionAttention)

Thx,

FangShancheng commented 2 years ago

To make the key vector more distinguishable from the value vector, which experimentally shows improvement.
Only dimension transition is considered, without other specific purposes.