UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.31k stars 2.48k forks source link

Add FAVOR+ / Performer attention support #715

Open Jbollenbacher opened 3 years ago

Jbollenbacher commented 3 years ago

Hi all,

The new Performer model may enable us to embed longer documents using the same SBERT method. HuggingFace is already implementing this new model.

When the Performer model becomes available on HuggingFace, will the SBERT team contribute pretrained model(s) that are appropriate for embedding longer documents? This would be an enormous contribution to the wider NLP community, and would contribute significantly to future citations of the SBERT paper and followup papers.

Thanks, Jbollenbacher

nreimers commented 3 years ago

Hi @Jbollenbacher yes, I am also looking forward to it.

However, two issues remain:

Nonetheless, Performer can also be interesting for paragraphs (few 100 words) as it can be more efficient than BERT & Co.

elliottash commented 3 years ago

For the long documents, one idea would be to use legal opinions, for example from courtlistener.com.