LocAtE
Location-based Attention Exhaustion
Model architecture
- Bottleneck: optimizes computational cost, allowing the model to leverage deeper subnetworks
- Style-based: using a dense ffnn with roottanh activation
- SpectralNorm + BatchNorm: to enforce the lipschitz constant globally and therefore alleviate the gradient problems introduced by the multiplication of layer-outputs
- Self-Attention: to prioritize image regions
- Feature-Attention: learnable location-based feature-level attention
- Deep: Every block (up/down) contains 6 convolutional layers (x2 if factorized), two attention layers (+2, +3) and one scale layer (=12 or 18)
- RootTanh: A brand-new customizable activation function