gdscewha-3rd / Study-PaperReview

๐Ÿค– ์ธ๊ณต์ง€๋Šฅ ๋…ผ๋ฌธ์ฝ๊ธฐ ์Šคํ„ฐ๋””
6 stars 0 forks source link

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding #31

Closed m0oon0 closed 2 years ago

m0oon0 commented 2 years ago

Model Architecture

Pretrain BERT using 2 unsupervised tasks.

  1. Masked LM

    • Mask some percentage (15%) of the input tokens at random, and then predict those masked tokens.
    • [MASK] token not used during fine-tuning -> mismatch between pre-training & fine-tuning. -> 80% are replaced with [MASK] token, 10% are replaced with random token, 10% remain unchanged.
  2. Next Sentence Prediction

    • Relationships between two sentences are not directly captured by language modeling.
    • BERT can understand sentence relationships by training for next sentence prediction task.
    • 50% : Actual next sentence, 50% : Random sentence from the corpus.
binable43 commented 2 years ago

BERT ์„ฑ๋Šฅ์ด ์ง„์งœ ๋†’์€ ํŽธ์ด๋ผ๊ณ  ํ•˜์ฃ !! ํ•œ๊ตญ์–ด ๊ธฐ๋ฐ˜์˜ KoBERT ๋ชจ๋ธ๋„ ์žˆ๋Š”๋ฐ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๊ด€์‹ฌ ์žˆ๋Š” ๋ถ„๋“ค ์ด๊ฒƒ๋„ ์ฐพ์•„๋ณด์‹œ๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™๋„ค์š”~ BERT์— ๋Œ€ํ•ด ์ž˜ ์ •๋ฆฌํ•ด์ฃผ์…”์„œ ์ž˜ ์ฝ๊ณ  ๊ฐ‘๋‹ˆ๋‹ค :)

chaerin314 commented 2 years ago

์š”์ฆ˜ GraphCodeBERT๋ฅผ fine-tuningํ•˜๊ณ  ์žˆ๋Š”๋ฐ ์ตœ๊ทผ์— ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์ด ๋‚˜์™€์„œ ๋ฐ˜๊ฐ‘๋„ค์š”ใ…Žใ…Ž ์ž˜ ์ฝ์—ˆ์Šต๋‹ˆ๋‹ค!

Si-jeong commented 2 years ago

์ž˜ ์ฝ์—ˆ์Šต๋‹ˆ๋‹ค :) ๐Ÿ‘

minha62 commented 2 years ago

์•„์ง ์ž์—ฐ์–ด์ฒ˜๋ฆฌ๋ฅผ ์ œ๋Œ€๋กœ ๊ณต๋ถ€ํ•ด๋ณธ ์ ์ด ์—†๋Š”๋ฐ ๋‹ค์Œ์— BERT๋ถ€ํ„ฐ ์ฐพ์•„๋ด์•ผ๊ฒ ์–ด์š”! ์ข‹์€ ๋‚ด์šฉ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค:)