Open dhkim0225 opened 3 years ago
PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture https://arxiv.org/pdf/2201.00978.pdf
ELSA: Enhanced Local Self-Attention for Vision Transformer https://arxiv.org/pdf/2112.12786v1.pdf
unicorn (Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling)
deepspeed-moe
15초 nerf (Instant Neural Graphics Primitives with a Multiresolution Hash Encoding)
True Few-Shot Learning with Language Models https://arxiv.org/pdf/2105.11447.pdf
FLAN https://arxiv.org/pdf/2109.01652.pdf
T0 https://arxiv.org/pdf/2110.08207.pdf
ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization https://arxiv.org/abs/2201.06910
Transformer Quality in Linear Time https://arxiv.org/abs/2202.10447
Warm Starting CMA-ES for Hyperparameter Optimization https://arxiv.org/abs/2012.06932
UNDERSTANDING DIMENSIONAL COLLAPSE IN CONTRASTIVE SELF-SUPERVISED LEARNING https://arxiv.org/pdf/2110.09348.pdf
On Warm-Starting Neural Network Training https://arxiv.org/pdf/1910.08475.pdf
The break-even point on optimization trajectories of deep neural networks. https://arxiv.org/abs/2002.09572
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization https://arxiv.org/pdf/2012.14193.pdf
On the Origin of Implicit Regularization in Stochastic Gradient Descent https://arxiv.org/pdf/2101.12176.pdf
Sharpness-Aware Minimization for Efficiently Improving Generalization https://arxiv.org/pdf/2010.01412.pdf
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks https://arxiv.org/abs/2102.11600
Towards Efficient and Scalable Sharpness-Aware Minimization (LookSAM) https://arxiv.org/pdf/2203.02714v1.pdf
A Loss Curvature Perspective on Training Instability in Deep Learning https://arxiv.org/pdf/2110.04369.pdf
Surrogate Gap Minimization Improves Sharpness-Aware Training https://arxiv.org/abs/2203.08065
When vision transformers outperform resnets without pretraining or strong data augmentations https://arxiv.org/abs/2106.01548
prompt
Calibrate Before Use: Improving Few-Shot Performance of Language Models (https://arxiv.org/abs/2102.09690) p-tuning (https://arxiv.org/abs/2104.08691) Do Prompt-Based Models Really Understand the Meaning of their Prompts? (https://arxiv.org/abs/2109.01247) An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models (https://arxiv.org/pdf/2109.02772.pdf) FLAN (https://arxiv.org/pdf/2109.01652.pdf) Text Style Transfer (https://arxiv.org/abs/2109.03910) prompt 생성해서 NMT (https://arxiv.org/abs/2110.05448)
LM
BART (https://arxiv.org/abs/1910.13461) Primer (https://arxiv.org/abs/2109.08668) NormFormer (https://arxiv.org/abs/2110.09456) HTLM (https://arxiv.org/abs/2107.06955)
KIE Pretraining
LayoutLM (https://arxiv.org/abs/1912.13318) LayoutLMv2 (https://arxiv.org/abs/2012.14740) StructuralLM (https://arxiv.org/abs/2105.11210) MarkupLM (https://arxiv.org/abs/2110.08518)