Closed leon-cas closed 5 years ago
@leon-cas Cause no masking shouldn't be trained by optimizer, and 85% rate is came from the paper. We masked the 0 value which can't be trained through backprobagated
So you mean only 15% of all data are used to train MLM?
Yes and it's noticed on the paper too.
It means each word in a sentence is masked out with 15% probability and MLM is trained to predict the masked words. Please read the paper carefully.
@jiqiujia @codertimo thanks, guys.
In dataset.py, function 'random_word', line90, why the output_label of 85% data(no masking) is set to 0 ,
output_label.append(0)
?