dhkim0225 / 1day_1paper

read 1 paper everyday (only weekday)
53 stars 1 forks source link

TODO LIST #15

Open dhkim0225 opened 3 years ago

dhkim0225 commented 3 years ago

prompt

Calibrate Before Use: Improving Few-Shot Performance of Language Models (https://arxiv.org/abs/2102.09690) p-tuning (https://arxiv.org/abs/2104.08691) Do Prompt-Based Models Really Understand the Meaning of their Prompts? (https://arxiv.org/abs/2109.01247) An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models (https://arxiv.org/pdf/2109.02772.pdf) FLAN (https://arxiv.org/pdf/2109.01652.pdf) Text Style Transfer (https://arxiv.org/abs/2109.03910) prompt 생성해서 NMT (https://arxiv.org/abs/2110.05448)

LM

BART (https://arxiv.org/abs/1910.13461) Primer (https://arxiv.org/abs/2109.08668) NormFormer (https://arxiv.org/abs/2110.09456) HTLM (https://arxiv.org/abs/2107.06955)

KIE Pretraining

LayoutLM (https://arxiv.org/abs/1912.13318) LayoutLMv2 (https://arxiv.org/abs/2012.14740) StructuralLM (https://arxiv.org/abs/2105.11210) MarkupLM (https://arxiv.org/abs/2110.08518)

dhkim0225 commented 2 years ago

PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture https://arxiv.org/pdf/2201.00978.pdf

ELSA: Enhanced Local Self-Attention for Vision Transformer https://arxiv.org/pdf/2112.12786v1.pdf

unicorn (Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling)

deepspeed-moe

15초 nerf (Instant Neural Graphics Primitives with a Multiresolution Hash Encoding)

True Few-Shot Learning with Language Models https://arxiv.org/pdf/2105.11447.pdf

FLAN https://arxiv.org/pdf/2109.01652.pdf

T0 https://arxiv.org/pdf/2110.08207.pdf

ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization https://arxiv.org/abs/2201.06910

Transformer Quality in Linear Time https://arxiv.org/abs/2202.10447

Hyperparm search

Warm Starting CMA-ES for Hyperparameter Optimization https://arxiv.org/abs/2012.06932

SSL

UNDERSTANDING DIMENSIONAL COLLAPSE IN CONTRASTIVE SELF-SUPERVISED LEARNING https://arxiv.org/pdf/2110.09348.pdf

OCR

STKM https://openaccess.thecvf.com/content/CVPR2021/papers/Wan_Self-Attention_Based_Text_Knowledge_Mining_for_Text_Detection_CVPR_2021_paper.pdf

학습 초기?

On Warm-Starting Neural Network Training https://arxiv.org/pdf/1910.08475.pdf

The break-even point on optimization trajectories of deep neural networks. https://arxiv.org/abs/2002.09572

Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization https://arxiv.org/pdf/2012.14193.pdf

On the Origin of Implicit Regularization in Stochastic Gradient Descent https://arxiv.org/pdf/2101.12176.pdf

loss landscape

Sharpness-Aware Minimization for Efficiently Improving Generalization https://arxiv.org/pdf/2010.01412.pdf

ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks https://arxiv.org/abs/2102.11600

Towards Efficient and Scalable Sharpness-Aware Minimization (LookSAM) https://arxiv.org/pdf/2203.02714v1.pdf

A Loss Curvature Perspective on Training Instability in Deep Learning https://arxiv.org/pdf/2110.04369.pdf

Surrogate Gap Minimization Improves Sharpness-Aware Training https://arxiv.org/abs/2203.08065

Augmentation 의 역할은 뭘까?

When vision transformers outperform resnets without pretraining or strong data augmentations https://arxiv.org/abs/2106.01548

ICLR Oral

https://openreview.net/group?id=ICLR.cc/2022/Conference