google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.33k stars 352 forks source link

finetune models have relative bad performance when using my own base level pretrain models #111

Open 652994331 opened 3 years ago

652994331 commented 3 years ago

Hi, guys, thank you for your electra models. Recently, i used my own data to continue pretrain a base-level electra model(this one: https://github.com/ymcui/Chinese-ELECTRA). This pretrain model is a Chinese electra, so it has a different vocab.txt(same as bert base vocab 21128 lines). So, what I have done was: First, I used build_pretrain_dateset.py(21128 vocab) to generate tfrecords. Second, I added init_checkpoint according to this:https://github.com/google-research/electra/pull/74. Third, I pretrained my own base-level electra-chinese model from Chinese-ELETRA, the parameters i used were lr 2e-4, training steps 1000000, base model. command line are like this: python3 run_pretraining.py --data-dir pretrain_chinese_model/ --model-name my_model --init_checkpoint pretrain_chinese_model/models/Chinese-Electra

The loss after 100000 steps was around 3.4. However, I used 100000 steps pretrain_model to finetune a classification model. the performance is much worse than original Chinese_Electra. i was wondering why even 100000steps continue pretrain from Chinese_Electra could make such a bad performance, did I make any mistakes.