How to train the CERBERUS model

amazon-science / wqa-cerberus

[EMNLP 2022 (Long, Findings)] CERBERUS: Multi-head Student Model to distill knowledge in ensemble of teacher models

Other

5 stars 0 forks source link

Hello @YFCodeDream

Thank you for your interest in our work. For training, we used our internal code framework, which is not specifically for this project but for more general, internal use case, thus is not allowed to release the code unfortunately.

For AS2 tasks, you can start with HF Transformers' example script for GLUE as the model is *ForSequenceClassification even though the tasks in our paper are ranking (not classification).

If you want to train CERBERUS models in our manner (e.g., 3-headed CERBERUS learns from 3 teachers), I suggest you to pre-compute teachers' logits value per sample and include them as part of your dataset so that you can skip inference of the teacher models at runtime. The loss function and hyperparameters are also described in the paper.

amazon-science / wqa-cerberus

How to train the CERBERUS model #1