SimiaoZuo / MoEBERT

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
Apache License 2.0
97 stars 13 forks source link

The model on target task should be fined-tuned on the basis of BERT or MoEBERT? #1

Closed LisaWang0306 closed 2 years ago

LisaWang0306 commented 2 years ago

In README, you mentioned that:

Before running any distillation code, a pre-trained BERT model should be fine-tuned on the target task. Path to the fine-tuned model should be passed to --model_name_or_path. Can I fine-tune on bert-base-uncased model and run distillation code with MoE options? Is pretrained MoEBERT model necessary? Thanks very much!

SimiaoZuo commented 2 years ago

For example, suppose we are working on the MNLI dataset. We first fine-tune a pre-trained BERT model (e.g., bert-base-uncased) on MNLI, and this fine-tuned model serves as the teacher.

LisaWang0306 commented 2 years ago

Thanks!

CaffreyR commented 2 years ago

Hi @LisaWang0306 , may I ask that how to get the finetune model as you mentioned in bert-base-uncased! Many thanks!