TODO LIST - Githubissues

PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture https://arxiv.org/pdf/2201.00978.pdf

ELSA: Enhanced Local Self-Attention for Vision Transformer https://arxiv.org/pdf/2112.12786v1.pdf

unicorn (Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling)

deepspeed-moe

15초 nerf (Instant Neural Graphics Primitives with a Multiresolution Hash Encoding)

True Few-Shot Learning with Language Models https://arxiv.org/pdf/2105.11447.pdf

ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization https://arxiv.org/abs/2201.06910

Transformer Quality in Linear Time https://arxiv.org/abs/2202.10447

Hyperparm search

Warm Starting CMA-ES for Hyperparameter Optimization https://arxiv.org/abs/2012.06932

UNDERSTANDING DIMENSIONAL COLLAPSE IN CONTRASTIVE SELF-SUPERVISED LEARNING https://arxiv.org/pdf/2110.09348.pdf

On Warm-Starting Neural Network Training https://arxiv.org/pdf/1910.08475.pdf

The break-even point on optimization trajectories of deep neural networks. https://arxiv.org/abs/2002.09572

Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization https://arxiv.org/pdf/2012.14193.pdf

On the Origin of Implicit Regularization in Stochastic Gradient Descent https://arxiv.org/pdf/2101.12176.pdf

Sharpness-Aware Minimization for Efficiently Improving Generalization https://arxiv.org/pdf/2010.01412.pdf

ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks https://arxiv.org/abs/2102.11600

Towards Efficient and Scalable Sharpness-Aware Minimization (LookSAM) https://arxiv.org/pdf/2203.02714v1.pdf

A Loss Curvature Perspective on Training Instability in Deep Learning https://arxiv.org/pdf/2110.04369.pdf

Surrogate Gap Minimization Improves Sharpness-Aware Training https://arxiv.org/abs/2203.08065

When vision transformers outperform resnets without pretraining or strong data augmentations https://arxiv.org/abs/2106.01548