dbiir / UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
https://github.com/dbiir/UER-py/wiki
Apache License 2.0
3.01k stars 525 forks source link

请问run_classifier_multi_label.py的label数据格式是什么? #320

Closed jyshen99 closed 2 years ago

zhezhaoa commented 2 years ago

https://github.com/dbiir/UER-py/wiki/下游任务微调 这列给出了示例 UER-py支持multi-label分类任务的微调和推理。一条样本可能有多个标签。这里使用 Toxic Comment Classification Challenge 英文数据集。使用BERT在Toxic Comment Classification Challenge(UER支持的格式)数据集上微调和推理示例:

python3 finetune/run_classifier_multi_label.py --pretrained_model_path models/bert_base_en_uncased_model.bin \
                                               --vocab_path models/google_uncased_en_vocab.txt \
                                               --config_path models/bert/base_config.json \
                                               --train_path datasets/toxic_comment/train.tsv \
                                               --dev_path datasets/toxic_comment/dev.tsv \
                                               --epochs_num 3 --batch_size 64 --seq_length 128

python3 inference/run_classifier_multi_label_infer.py --load_model_path models/finetuned_model.bin \
                                                      --vocab_path models/google_uncased_en_vocab.txt \
                                                      --config_path models/bert/base_config.json \
                                                      --test_path datasets/toxic_comment/dev.tsv \
                                                      --prediction_path datasets/toxic_comment/prediction.tsv \
                                                      --seq_length 128 --labels_num 7
jyshen99 commented 2 years ago

非常感谢!!