Thank you for your great code. I'm a student and a beginner of data analysis.
I want to executive your code but I have some questions. It may be a silly question, but can you give me some details about files?
We need a $DATA_FILE as a train set, but what is vocab.txt? I can get the vocab.txt file from google's github. Just use it? or Can I customize it?(Because I want to make a bert which has lower parameters than BERT-BASE.)
Also, the ouput file model_steps_xxxx.pt is compatible with BERT in google's github?
Sorry I am not an expert, so maybe my questions are so silly. Thank you.
Thank you for your great code. I'm a student and a beginner of data analysis. I want to executive your code but I have some questions. It may be a silly question, but can you give me some details about files?
python pretrain.py \ --train_cfg config/pretrain.json \ --model_cfg config/bert_base.json \ --data_file $DATA_FILE \ --vocab $BERT_PRETRAIN/vocab.txt \ --save_dir $SAVE_DIR \ --max_len 512 \ --max_pred 20 \ --mask_prob 0.15
We need a $DATA_FILE as a train set, but what is vocab.txt? I can get the vocab.txt file from google's github. Just use it? or Can I customize it?(Because I want to make a bert which has lower parameters than BERT-BASE.) Also, the ouput file model_steps_xxxx.pt is compatible with BERT in google's github?
Sorry I am not an expert, so maybe my questions are so silly. Thank you.