elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.73k stars 24.68k forks source link

Support NLP summarization models #87548

Open davidkyle opened 2 years ago

davidkyle commented 2 years ago

Description

Add support for text summarization models such as sshleifer/distilbart-cnn-12-6 · Hugging Face.

Summarization is a task which produces a synopsis or abridgement of a longer piece of text.

elasticmachine commented 2 years ago

Pinging @elastic/ml-core (Team:ML)

joshdevins commented 1 year ago

Some models are incredibly slow for CPU inference (T5 models, for example), so this was a major limiting factor to their use in the past. We also previously did not support the tokenizations that the smaller generative models were using (BPE), but that is implemented now for RoBERTa models. The DistilBART models are faster but not as good at summarization, so any support we do add might be limited. But given that we could now add some basic model support, can we revisit this? It seems we are able to implement the task type now that we support some of the tokenizers and pre-trained models necessary for it.

bbsandeep commented 1 year ago

Why is this not being prioritized ? Summarization is a critical step in most of the Machine learning & Text Analysis activities that we do. Elastic not supporting this is a stumbling block and we are having to look at custom solutions to achieve this.

Can someone from Elastic educate us on why they think this is not a needed capability ?