SSUHan / PaparReviews

8 stars 2 forks source link

[18.11.30] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding #6

Open jason9693 opened 5 years ago

jason9693 commented 5 years ago

BERT: Bidirectional Encoder Representation from Transformers.

Accepted Conference Name & Year : 2018
1st Author Name & Institute : J Devlin, Google Brain

Keywords
Bidirectional Transformer
Transfer Learning
Masked LM (MLM)

Contribution
여러개 국가 언어의 nlp Task에서 SOTA 갱신
GLUE bench mark to 80.4% (7.6%p 상승)
MultiNLI Accuracy : 86.7% ( 5.6%p 상승)
SQuADv1.1 : 93.2 ( 1.5%p 상승)

Proposed Architecture
gpt : 단방향(좌->우)
ELMo : (좌->우) (우->좌) 후 concat
BERT : 양방향

2018-12-02 5 05 43

토큰 임베딩 : wordpiece embedding (아래 추가자료 참조)
세그먼트 임베딩 : 문장기준 임베딩 ( 단일 문장은 동일값)
포지션 임베딩 : 인덱스 기준 임베딩. ( 최대 문장길이 선택 ie. 512)

2018-12-02 5 21 30

Pre training: 1) MLM

2) Next Sentence Prediction

Fine Tuning

2018-12-02 5 40 39

2018-12-02 5 05 53

2018-12-02 5 06 22

Dataset

GLUE
SQuAD v1.1
Named Entity Recognition ( NER )
SWAG
Valuable Relative Works
Word Piece Model for Korean
ELMo

SSUHan commented 5 years ago

Questions

BERT 는 Word Embedding 만 되는거 확실?
Word Embedding 이라면 Sentence Embedding 할때 어떤방법을 Baseline 으로 사용했나요?

jason9693 commented 5 years ago

@SSUHan 정확히는 Word를 임베딩 한것이 아니라, 'Word Piece Model'로 토크나이징한 'Word Piece'를 임베딩 한것입니다. Sentence Embedding에 대한 Baseline은 아직까지 못찾았습니다.