I use a chinese corpus to pre-train a bert on this project.
And i find that my loss almost stop decreasing when it reaches about 4.0. I never trained an english version bert. Is there some more training log for english bert? I just want to know the final token-level MLM loss on English bert pre-training. Thanks first.
I use a chinese corpus to pre-train a bert on this project.
And i find that my loss almost stop decreasing when it reaches about 4.0. I never trained an english version bert. Is there some more training log for english bert? I just want to know the final token-level MLM loss on English bert pre-training. Thanks first.