SimiaoZuo / MoEBERT

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
Apache License 2.0
97 stars 13 forks source link

MoEBERT

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Installation

Instructions

MoEBERT targets task-specific distillation. Before running any distillation code, a pre-trained BERT model should be fine-tuned on the target task. Path to the fine-tuned model should be passed to --model_name_or_path.

Importance Score Computation

Knowledge Distillation