How much the token-level MLM loss usually is when the bert pre-training stops converging?

dhlee347 / pytorchic-bert

Pytorch Implementation of Google BERT

Apache License 2.0

591 stars 179 forks source link

How much the token-level MLM loss usually is when the bert pre-training stops converging? #34

Closed MingLunHan closed 2 years ago

MingLunHan commented 2 years ago

I use a chinese corpus to pre-train a bert on this project.

And i find that my loss almost stop decreasing when it reaches about 4.0. I never trained an english version bert. Is there some more training log for english bert? I just want to know the final token-level MLM loss on English bert pre-training. Thanks first.