codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation
Apache License 2.0
6.19k stars 1.3k forks source link

chooses 15% of token #56

Open makcedward opened 5 years ago

makcedward commented 5 years ago

From paper, it mentioned

Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my dog is hairy it chooses hairy.

It means that 15% of token will be choose for sure.

From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68, for every single token, it has 15% of chance that go though the followup procedure. Does it aligned with 15% of token will be chosen?

codertimo commented 5 years ago

Sorry for the late response, I think you are right. I'll fix it ASAP