FangShancheng / ABINet

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Other
437 stars 73 forks source link

Insight about tiny U-net structure for K-embedding #67

Open mandal4 opened 2 years ago

mandal4 commented 2 years ago

Thanks for nice paper and the source codes. And i have some question about the codes which stands for the VM.

  1. The purpose for the U-net structure in K-embedding? (self.k_encoder and self.k_decoder in class PositionAttention)
  2. The purpose for the projection function for positional encoding (self.project in class PositionAttention)

Thx,

FangShancheng commented 2 years ago
  1. To make the key vector more distinguishable from the value vector, which experimentally shows improvement.
  2. Only dimension transition is considered, without other specific purposes.