dhlee347 / pytorchic-bert

Pytorch Implementation of Google BERT
Apache License 2.0
589 stars 181 forks source link

Only 80% real masks, 10% random vocabs in n-gram MLM #19

Closed graykode closed 4 years ago

graykode commented 4 years ago

In ALBERT(Lan at el), There is not detail about 80% mask image

But, from n-gram masking (Joshi et al., 2019), they said about 80/10/10

As in BERT, we also mask 15% of the tokens in total: replacing 80% of the masked tokens with [MASK], 10% with random tokens and 10% with the original tokens. However, we perform this replacement at the span level and not for each token individually; i.e. all the tokens in a span are replaced with [MASK]or sampled tokens