general distillation为什么也用的是task data, 不应该用general data吗？

是的，你说得对，我这个脚本只是提供相关参数，然后传递这些参数可以跑通，用的数据不一定是正确的。

2022年9月22日下午8:48，Vincent-Ww @.***> 写道：

CUDA_VISIBLE_DEVICES=2,3 python general_distill.py \ --teacher_model /nas/pretrain-bert/pretrain-pytorch/chinese_wwm_ext_pytorch/ \ --student_model student_model/ \ --train_file_path /nas/lishengping/datas/tiny_task_data/train.txt \ --do_lower_case \ --train_batch_size 20 \ --output_dir ./output_dir \ --learning_rate 5e-5 \ --num_train_epochs 3 \ --eval_step 5000 \ --max_seq_len 128 \ --gradient_accumulation_steps 1 3>&2 2>&1 1>&3 | tee logs/tiny_bert.log 请问generall distillation为什么也用了task_data。

— Reply to this email directly, view it on GitHub https://github.com/Lisennlp/TinyBert/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIGXIE6R2NUJ3AG4ACRHGLV7RISNANCNFSM6AAAAAAQTAURJM. You are receiving this because you are subscribed to this thread.

Lisennlp / TinyBert

general distillation为什么也用的是task data, 不应该用general data吗？ #11