Lisennlp / TinyBert

简洁易用版TinyBert:基于Bert进行知识蒸馏的预训练语言模型
251 stars 49 forks source link

general distillation为什么也用的是task data, 不应该用general data吗? #11

Open Vincent-Ww opened 2 years ago

Vincent-Ww commented 2 years ago
CUDA_VISIBLE_DEVICES=2,3 python general_distill.py   \
                          --teacher_model /nas/pretrain-bert/pretrain-pytorch/chinese_wwm_ext_pytorch/ \
                          --student_model student_model/  \
                          --train_file_path  /nas/lishengping/datas/tiny_task_data/train.txt \
                          --do_lower_case \
                          --train_batch_size 20 \
                          --output_dir ./output_dir  \
                          --learning_rate 5e-5  \
                          --num_train_epochs  3  \
                          --eval_step  5000  \
                          --max_seq_len  128  \
                          --gradient_accumulation_steps  1  3>&2 2>&1 1>&3 | tee logs/tiny_bert.log

第四行,请问generall distillation为什么也用了task_data。

Lisennlp commented 2 years ago

是的,你说得对,我这个脚本只是提供相关参数,然后传递这些参数可以跑通,用的数据不一定是正确的。

2022年9月22日 下午8:48,Vincent-Ww @.***> 写道:

CUDA_VISIBLE_DEVICES=2,3 python general_distill.py \ --teacher_model /nas/pretrain-bert/pretrain-pytorch/chinese_wwm_ext_pytorch/ \ --student_model student_model/ \ --train_file_path /nas/lishengping/datas/tiny_task_data/train.txt \ --do_lower_case \ --train_batch_size 20 \ --output_dir ./output_dir \ --learning_rate 5e-5 \ --num_train_epochs 3 \ --eval_step 5000 \ --max_seq_len 128 \ --gradient_accumulation_steps 1 3>&2 2>&1 1>&3 | tee logs/tiny_bert.log 请问generall distillation为什么也用了task_data。

— Reply to this email directly, view it on GitHub https://github.com/Lisennlp/TinyBert/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIGXIE6R2NUJ3AG4ACRHGLV7RISNANCNFSM6AAAAAAQTAURJM. You are receiving this because you are subscribed to this thread.