amazon-science / wqa-cerberus

[EMNLP 2022 (Long, Findings)] CERBERUS: Multi-head Student Model to distill knowledge in ensemble of teacher models
https://www.amazon.science/publications/ensemble-transformer-for-efficient-and-accurate-ranking-tasks-an-application-to-question-answering-systems
Other
5 stars 0 forks source link
answer-sentence-selection knowledge-distillation nlp question-answering transformer

Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems

This is the official CERBERUS model code repository for our long paper in Findings of EMNLP 2022, "Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems".

Citation

[Paper] [Amazon Science] [Preprint]

@inproceedings{matsubara2022ensemble,
  title={{Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems}},
  author={Matsubara, Yoshitomo and Soldaini, Luca and Lind, Eric and Moschitti, Alessandro},
  booktitle={Findings of the Association for Computational Linguistics: EMCNLP 2022},
  pages={7259--7272},
  year={2022}
}

Implementation

Our CERBERUS implementation is based on transformers.ElectraForSequenceClassification and tested under the following conditions:

ASNQ: CERBERUS 11B-3B1

This CERBERUS model consists of 11 shared encoder body layers and 3 ranking heads of 1 head layer each learned from 3 teacher AS2 models: ALBERT-XXLarge, ELECTRA-Large, and RoBERTa-Large fine-tuned on the ASNQ dataset.

Download and unzip cerberus11-3_albert_electra_roberta_asnq.zip

from transformers import AutoTokenizer
from cerberus import CerberusModel

tokenizer = AutoTokenizer.from_pretrained('google/electra-base-discriminator')
start_ckpt_file_path = './cerberus11-3_albert_electra_roberta_asnq/cerberus_model.pt'
model = CerberusModel(None, 11, start_ckpt_file_path=start_ckpt_file_path)
model.eval()
input_dict = tokenizer([('question', 'answer sentence')], 
                       return_tensors='pt',
                       max_length=128,
                       truncation=True)
output = model(**input_dict)

WikiQA: CERBERUS 11B-3B1

This CERBERUS model consists of 11 shared encoder body layers and 3 ranking heads of 1 head layer each learned from 3 teacher AS2 models: ALBERT-XXLarge, ELECTRA-Large, and RoBERTa-Large fine-tuned on the ASNQ dataset and then on the WikiQA dataset.

Download and unzip cerberus11-3_albert_electra_roberta_asnq_wikiqa.zip and asnq-electra-base-discriminator.

from transformers import AutoTokenizer
from cerberus import CerberusModel

asnq_ckpt_dir_path = './asnq-electra-base-discriminator'
tokenizer = AutoTokenizer.from_pretrained(asnq_ckpt_dir_path)

head_configs = [
    {'model': {'pretrained_model_name_or_path': asnq_ckpt_dir_path},
     'base_model': 'electra', 'classifier': 'classifier'},
    {'model': {'pretrained_model_name_or_path': asnq_ckpt_dir_path},
     'base_model': 'electra', 'classifier': 'classifier'},
    {'model': {'pretrained_model_name_or_path': asnq_ckpt_dir_path},
     'base_model': 'electra', 'classifier': 'classifier'}
]

start_ckpt_file_path = './cerberus11-3_albert_electra_roberta_asnq_wikiqa/cerberus_model.pt'
model = CerberusModel(head_configs, 11, start_ckpt_file_path=start_ckpt_file_path)
model.eval()
input_dict = tokenizer([('question', 'answer sentence')], 
                       return_tensors='pt',
                       max_length=128,
                       truncation=True)
output = model(**input_dict)

Security

See CONTRIBUTING for more information.

License

This library is licensed under the CC-BY-NC-4.0 License.