Closed zhuzitong closed 1 year ago
Hi Zhuzitong,
Yes, that is common. It highly correlate with your vocabulary size and the size of the pre-training dataset.
The key measurement for Med-BERT like pretrained models is the performance boost it lead to on different downstream tasks.
So I'd monitor the model pretraining, and once the MLM loss start to plateau, I frequently test checkpoints on downstream tasks.
Please let me know if you have any further questions.
Hi there,
I was attempting to implement Med-Bert using real data collected from the local hospital system, but encountered an issue: the accuracy of the Masked Language Modeling (MLM) task in the pre-training phase was very low. Have you and your team encountered this problem? Here are the details:
Here is the data's information. vocabulary size: 37214 data size: 25000
Here is the information of config file
Here is the information of the model
Here is the result of the pre-training's training part.
Here is the result of the pre-training's validation part.
Here are some results print from task MLM