amazon-science / wqa-cerberus

[EMNLP 2022 (Long, Findings)] CERBERUS: Multi-head Student Model to distill knowledge in ensemble of teacher models
https://www.amazon.science/publications/ensemble-transformer-for-efficient-and-accurate-ranking-tasks-an-application-to-question-answering-systems
Other
5 stars 0 forks source link

How to train the CERBERUS model #1

Open YFCodeDream opened 11 months ago

YFCodeDream commented 11 months ago

Thanks for this excellent work! I would like to know how to train CERBERUS, because I want to make some structural improvements based on this model. It seems that README only provides the code for testing CERBERUS. Could you release the code for training CERBERUS? Looking forward to the reply!

yoshitomo-matsubara commented 11 months ago

Hello @YFCodeDream

Thank you for your interest in our work. For training, we used our internal code framework, which is not specifically for this project but for more general, internal use case, thus is not allowed to release the code unfortunately.

For AS2 tasks, you can start with HF Transformers' example script for GLUE as the model is *ForSequenceClassification even though the tasks in our paper are ranking (not classification).

If you want to train CERBERUS models in our manner (e.g., 3-headed CERBERUS learns from 3 teachers), I suggest you to pre-compute teachers' logits value per sample and include them as part of your dataset so that you can skip inference of the teacher models at runtime. The loss function and hyperparameters are also described in the paper.