deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.24k stars 1.89k forks source link

Implementing distillation loss functions from TinyBERT #1873

Closed MichelBartels closed 2 years ago

MichelBartels commented 2 years ago

Is your feature request related to a problem? Please describe. A basic version of model distillation was implemented with #1758. However, there is still room for improvement. The TinyBERT paper (https://arxiv.org/pdf/1909.10351.pdf) details an approach for finetuning an already pretrained small language model.

Describe the solution you'd like The distillation loss functions in the TinyBERT paper should be usable when distilling a model in haystack using the distil_from method.

Describe alternatives you've considered https://arxiv.org/pdf/1910.08381.pdf: Seems to depend too heavily on expensive retraining and seems to be too task specific. https://arxiv.org/pdf/2002.10957.pdf, https://arxiv.org/pdf/1910.01108.pdf: Seem only to focus on pretraining

Additional context This is the first of two issues for implementing finetuning as described in the TinyBERT paper. This issue focusses on the loss functions. The second issue focusses on data augmentation.

julian-risch commented 2 years ago

closed by https://github.com/deepset-ai/haystack/pull/1879