dhlee347 / pytorchic-bert

Pytorch Implementation of Google BERT
Apache License 2.0
591 stars 179 forks source link

Masked subword prediction problem #24

Open akakakakakaa opened 4 years ago

akakakakakaa commented 4 years ago

In pretrain get_loss function, loss_lm is calculated by mean.

Because of this, all zero values in loss_lm handles as a correct answer.

So, I think we need to change mean to numerator / denominator like tensorflow.

loss_lm = (loss_lm masked_weights.float()).mean() to loss_lm_numerator = (loss_lmmasked_weights.float()).sum() loss_lm_denominator = masked_weights.sum() + 1e-5 loss_lm = loss_lm_numerator / loss_lm_denominator

Is it correct?