Taekyoon / my_paper_review

0 stars 0 forks source link

MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers #12

Closed Taekyoon closed 4 years ago

Taekyoon commented 4 years ago
Taekyoon commented 4 years ago

사전에 알면 좋은 내용들

Abstract

논문에서 살펴볼 만한 포인트

Main Contents

Knowledge Distillation

Screen Shot 2020-10-25 at 3 29 59 PM

Key Ideas

스크린샷 2020-10-09 오후 6 10 24

Self Attention Distribution Transfer

Screen Shot 2020-10-25 at 3 20 27 PM

Self-Attention Value-Relation Transfer

Screen Shot 2020-10-25 at 3 20 05 PM Screen Shot 2020-10-25 at 3 20 47 PM

Teacher Assistant

Comparison with previous Work

스크린샷 2020-10-09 오후 6 24 31

Experiments

Distillation Setup

Downstream Task Results

Screen Shot 2020-10-25 at 12 27 00 AM

Screen Shot 2020-10-25 at 1 18 11 AM

Ablation Studies

Screen Shot 2020-10-25 at 1 21 57 AM Screen Shot 2020-10-25 at 1 22 05 AM Screen Shot 2020-10-25 at 3 44 08 PM

Discussion (아직 이야기 끝나지 않았습니다...)

Screen Shot 2020-10-25 at 3 41 29 PM

Generation tasks

Screen Shot 2020-10-25 at 3 57 36 PM Screen Shot 2020-10-25 at 3 58 00 PM

Personal Opinion