Closed SOOJEONGKIMM closed 1 year ago
RoBERTA: replication study of BERT BERT was significantly undertrained. => hyperparameter tuning and training test size. (1)longer, bigger batches, more data (2) removing nsp loss (3) longer sequence (4) dynamic masking pattern
Result: exceed the performance of every model published after BERT. => SOTA in GLUE, RACE, SQuAD. => highlights the importance of previously overlooked design choices!
Static vs Dynamic Masking: Original BERT: static masking. Dynamic Masking: Avoids same mask at each training instance in every epoch. training data duplicated 10 times, each sequence is masked in 10 different ways, over 40 epochs of training.
Model Input Format and Next Sentence Prediction: SENTENCE-PAIR+NSP: shorter than 512 tokens, increase the batch size. FULL-SENTENCES: total length is at most 512 tokens. add an extra seperator token between documents. DOC-SENTENCES: similar to FULL-SENTENCES. may not cross document boundaries. shorter than 512 tokens, dynamically increase the batch size.
Training with large batches: Large mini-batches Equivalent computational cost
Text Encoding: Byte-Pair Encoding (BPE) bytes instead of unicode characters. without additional preprocessing or tokenization of input. BPE achieving slightly worse end-task performance on some tasks. but advantages of universal encoding scheme..
RoBERTa: Robustly optimized BERT approach. Dynaminc Masking FULL SENTENCE without NSP Large mini-batches Larger byte-level BPE
two important factors: (이전연구에서 간과) (1) data used for pretraining. (2) number of training passes. ex) XLNet: 10 times more data, batch size 8 times larger, thus seeing 4 times more sequences than BERT.
RoBERTa Conclusions: Training the model longer with bigger batches over more data. Removing nsp objective. Training on longer sequence. Dynamically changing the masking pattern.
SOTA result
importance of design decisions. BERT's pretraining objective remains competitive.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
https://arxiv.org/pdf/1907.11692.pdf