google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.27k stars 9.62k forks source link

Chinese Pretraining #259

Open Sara-HY opened 5 years ago

Sara-HY commented 5 years ago

Hi, I am pretraining Chinese bert model. But the loss of the model cannot drop down when it get 7. And I am not sure what the key problem is.

guoyaohua commented 5 years ago

兄弟,请教一下,bert-base 可以跑在两个8g 显存的gpu上么?作者说只能显存大于12G的单GPU,但是使用两个GPU,可以达到扩充显存的作用么?即使只用一个GPU做运算

breakjiang commented 5 years ago

Hi, I am pretraining Chinese bert model. But the loss of the model cannot drop down when it get 7. And I am not sure what the key problem is.

more num_train_steps

Sara-HY commented 5 years ago

Hi, I am pretraining Chinese bert model. But the loss of the model cannot drop down when it get 7. And I am not sure what the key problem is.

more num_train_steps

How many num_train_steps have you tried?

jockeyyan commented 5 years ago

Depends on how many GPUs you have, as google says: "Pre-training is fairly expensive (four days on 4 to 16 Cloud TPUs)"

breakjiang commented 5 years ago

Hi, I am pretraining Chinese bert model. But the loss of the model cannot drop down when it get 7. And I am not sure what the key problem is.

more num_train_steps

How many num_train_steps have you tried?

500000

Sara-HY commented 5 years ago

Hi, I am pretraining Chinese bert model. But the loss of the model cannot drop down when it get 7. And I am not sure what the key problem is.

more num_train_steps

How many num_train_steps have you tried?

500000

Could you please tell me your parameters for pre-trainining? And also, how many instances have you created for pre-training and how may GPUs do you use while training?

Sara-HY commented 5 years ago

Depends on how many GPUs you have, as google says: "Pre-training is fairly expensive (four days on 4 to 16 Cloud TPUs)"

What you said means it is time-consuming. But what I want to know is how many epochs I should train the model so that the loss can drop down.

breakjiang commented 5 years ago

Hi, I am pretraining Chinese bert model. But the loss of the model cannot drop down when it get 7. And I am not sure what the key problem is.

more num_train_steps

How many num_train_steps have you tried?

500000

Could you please tell me your parameters for pre-trainining? And also, how many instances have you created for pre-training and how may GPUs do you use while training?

1 million instances 1 GPU has 8GB of RAM 0.26 million vocab size model parameter L-4-H-288-A-12 batch size 32

If you have larger GPU RAM ,you can set larger hidden_size

Sara-HY commented 5 years ago

Hi, I am pretraining Chinese bert model. But the loss of the model cannot drop down when it get 7. And I am not sure what the key problem is.

more num_train_steps

How many num_train_steps have you tried?

500000

Could you please tell me your parameters for pre-trainining? And also, how many instances have you created for pre-training and how may GPUs do you use while training?

1 million instances 1 GPU has 64GB of RAM 0.26 million vocab size model parameter L-4-H-288-A-12 batch size 32

If you have larger GPU RAM ,you can set larger hidden_size

Thank you so much for your reply. Maybe I have something wrong on the pre-processing on the raw data for creating the instances. How about this part? Do you use wiki data directly? What else preprocessing have you done?

qiu-nian commented 5 years ago

Hi, I am pretraining Chinese bert model. But the loss of the model cannot drop down when it get 7. And I am not sure what the key problem is.

more num_train_steps

How many num_train_steps have you tried?

500000

Could you please tell me your parameters for pre-trainining? And also, how many instances have you created for pre-training and how may GPUs do you use while training?

1 million instances 1 GPU has 8GB of RAM 0.26 million vocab size model parameter L-4-H-288-A-12 batch size 32

If you have larger GPU RAM ,you can set larger hidden_size

@breakjiang Hello, I am also processing Chinese corpus. Is your vocab.txt file generated during data processing based on your own corpus? Are you training on the basis of the pre-trained Chinese model provided by Google? The size of the vocab in the pre-trained model is 21122. This seems to be fixed. I don't know how you handled it. Is 4 in "model parameter L-4-H-288-A-12" indicating 4 transformer layers, and 288 is the number of hidden units. Is this understood?

我也是用中午语料在训练模型,现在想要用自己的语料训练之后提取词向量。 (1)在数据预处理的时候,您使用的vocab.txt 这个文件是谷歌官方提供的吗,还是自己生成的呢,因为我看到您的vocab size是0.26 million,不知道这个您是怎么处理的。我是自己做分词,然后形成词典的,后来在模型pre-train时显存爆掉了,不知道是不是vocab.txt没处理好的原因。 (2)如果您是使用的自己的vocab.txt,那您在pre-train的时候是否是在谷歌提供的预训练好的中文模型的基础上做的呢,我在运行的时候遇到了词典大小不等的情况,谷歌的词典大小是21122。 (3)看您的参数设置,是有4层transformer层,288个hidden units,我可以这样理解吗? 希望得到您的解答和指点!

stevewyl commented 5 years ago

@qiunian711 Why to use own word dictionary?Bert-Chinese is character-based. Use the word dictionary may largely increase model size. I just continue pre-train bert model on my own corpus for only one epoch. And the f1 score of fine-tuned model got 1% improvement.

qiu-nian commented 5 years ago

@stevewyl Because my ultimate goal is to extract the vector of words like word2vec instead of the vector of characters, I want to train directly based on my own dictionary. This is mainly due to the fact that if a word is expressed in a few characters in Chinese, the meaning may be greatly deviated, so I dare not train directly based on characters. Do you have any good suggestions for this?

yijingzeng commented 5 years ago

@Sara-HY How do you pretrain the chinese model? Do you use the chinese_L-12_H-768_A-12 checkpoint? How many training data you use to pretrain?

biuleung commented 5 years ago

Have you met these errors while you fine tuned the self pre-trained model?

2 root error(s) found. (0) Not found: Key output_bias not found in checkpoint [[node save/RestoreV2 (defined at run_classifier.py:953) ]] [[save/RestoreV2/_301]] (1) Not found: Key output_bias not found in checkpoint [[node save/RestoreV2 (defined at run_classifier.py:953) ]] 0 successful operations. 0 derived errors ignored.

khaerulumam42 commented 5 years ago

Anyone who get access on TPU for free, I granted access from TFRC programs and get 100 TPU v2 for free. I have still many TPU to use, maybe you guys want it, I hope it will helps you. My access deadline just up to 24 July.

FengXuas commented 5 years ago

对中文预训练,需要对中文语料分词吗?