dbiir / UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
https://github.com/dbiir/UER-py/wiki
Apache License 2.0
2.96k stars 524 forks source link

pretrain卡住不动 #196

Closed shilida closed 3 years ago

shilida commented 3 years ago

一直在这个页面.... Using distributed mode for training. Worker 0 is training ... Worker 1 is training ...

shilida commented 3 years ago

GPU使用率为0%

hhou435 commented 3 years ago

麻烦提供一下运行命令

shilida commented 3 years ago

python pretrain.py --dataset_path corpora/dataset.pt --vocab_path model/robert/vocab.txt --config_path model/robert/bert-base-chinese/config.json --pretrained_model_path model/robert/bert-base-chinese/pytorch_model_uer.bin --output_model_path model/robert/bert-base-chinese/pytorch_model_law.bin --world_size 2 --gpu_ranks 0 1 --total_steps 5000 --save_checkpoint_steps 1000 --batch_size 32 --embedding word_pos_seg --encoder transformer --target bert

hhou435 commented 3 years ago

preprocess的命令可以提供下吗 以及您使用的语料是bert格式吗

shilida commented 3 years ago

--corpus_path corpora/law_bert.txt --vocab_path robert/vocab.txt --dataset_path corpora/dataset.pt --processes_num 8 --target bert

law_bert形式:一句一行

hhou435 commented 3 years ago

一句一行的话用mlm任务,bert任务需要bert格式的语料,可以参考corpora里的例子*_bert.txt

shilida commented 3 years ago

image 我是按照这个形式生成的,bert格式的语料是什么形式呢?我刚接触这个领域,谢谢大佬!!!

hhou435 commented 3 years ago

可以参考corpora/CLUECorpusSmall_5000_lines_bert.txt这个文件

shilida commented 3 years ago

txt文件格式的问题,已解决,谢谢!