the pre-trained MLM performance

brightmart / bert_language_understanding

Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN

959 stars 211 forks source link

the pre-trained MLM performance #6

Closed yyht closed 6 years ago

yyht commented 6 years ago

Hi, I tried to use your bert_cnn_model to train my corpus, 90W sentences and 30w words, the average length of sentence is 30 after tokenization. But the model seems to stuck on local minial that the accuracy on validation set just fluctuates after first 5-epoch

brightmart commented 6 years ago

Hi. it's too few corpus to train on pretrain stage. i think you need millions sentences, at least one million. it's easy to get raw data for pretrain stage, as long as each line contains a document or sentence(s).

it's also common sense to use lots of corpus to train on word embedding, same apply to pretrain language model.

let me know result after using lots of data for pretrain masked language model.

geogreff commented 6 years ago

Hi, I tried your bert_model rather than bert_cnn_model. Bert_model could get about 75% F1 score on language model task. But using the pretrained bert_model to finetune on classification task, it didn't work. F1 score was still about 10% after several epoches. It is something wrong with bert_model?