dbiir / UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
https://github.com/dbiir/UER-py/wiki
Apache License 2.0
3.01k stars 525 forks source link

When using sentencepiece, a Segmentation fault is reported #278

Open hhou435 opened 2 years ago

hhou435 commented 2 years ago

Word-based pretraining with sentencepiece

python3 preprocess.py --corpus_path corpora/book_review.txt \
                      --spm_model_path models/cluecorpussmall_spm.model \
                      --dataset_path book_review_word_sentencepiece_dataset.pt \
                      --processes_num 8 --seq_length 128 --dynamic_masking \
                      --data_processor mlm

python3 pretrain.py --dataset_path book_review_word_sentencepiece_dataset.pt \
                    --spm_model_path models/cluecorpussmall_spm.model \
                    --output_model_path models/book_review_word_sentencepiece_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 5000 --save_checkpoint_steps 2500 --report_steps 500 \
                    --learning_rate 1e-4 --batch_size 64 \
                    --tie_weights

Report the following error Segmentation fault