BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

m0oon0 commented 2 years ago

Model Architecture

multi-layer bidirectional Transformer encoder
Input representation is able to handle both a single sentence & a pair of sentences in one token sequence.
WordPiece embeddings with 30000 token vocabulary.
The input embeddings are the sum of the token embeddings, the segmentation embeddings and the position embeddings.

Pretrain BERT using 2 unsupervised tasks.

Masked LM
- Mask some percentage (15%) of the input tokens at random, and then predict those masked tokens.
- [MASK] token not used during fine-tuning -> mismatch between pre-training & fine-tuning. -> 80% are replaced with [MASK] token, 10% are replaced with random token, 10% remain unchanged.
Next Sentence Prediction
- Relationships between two sentences are not directly captured by language modeling.
- BERT can understand sentence relationships by training for next sentence prediction task.
- 50% : Actual next sentence, 50% : Random sentence from the corpus.

binable43 commented 2 years ago

BERT 성능이 진짜 높은 편이라고 하죠!! 한국어 기반의 KoBERT 모델도 있는데 자연어처리 관심 있는 분들 이것도 찾아보시면 좋을 것 같네요~ BERT에 대해 잘 정리해주셔서 잘 읽고 갑니다 :)

chaerin314 commented 2 years ago

요즘 GraphCodeBERT를 fine-tuning하고 있는데 최근에 공부한 내용이 나와서 반갑네요ㅎㅎ 잘 읽었습니다!

Si-jeong commented 2 years ago

잘 읽었습니다 :) 👍

minha62 commented 2 years ago

아직 자연어처리를 제대로 공부해본 적이 없는데 다음에 BERT부터 찾아봐야겠어요! 좋은 내용 감사합니다:)

gdscewha-3rd / Study-PaperReview