Add FAVOR+ / Performer attention support

Jbollenbacher commented 3 years ago

Hi all,

The new Performer model may enable us to embed longer documents using the same SBERT method. HuggingFace is already implementing this new model.

When the Performer model becomes available on HuggingFace, will the SBERT team contribute pretrained model(s) that are appropriate for embedding longer documents? This would be an enormous contribution to the wider NLP community, and would contribute significantly to future citations of the SBERT paper and followup papers.

Thanks, Jbollenbacher

nreimers commented 3 years ago

Hi @Jbollenbacher yes, I am also looking forward to it.

However, two issues remain:

Getting good training data for longer documents is much harder to get. One option would be to try the MS MARCO document retrieval dataset
Embedding text to a fixed sized dense vector makes only sense up to a certain length. The vectors have 768 dimensions, to this are ~1500 bytes (float16) or ~3000 bytes (float32). Compressing 50kb / 100kb or even more to just 768 dimension (1500/3000 bytes) is not possible, you will loose a lot of information. The larger the document, the more information you will loose. Hence, dense embeddings for longer text can provide only high-level information on the text, but will not be suitable to make fine-grained differences or to allow fine-grained search.

Nonetheless, Performer can also be interesting for paragraphs (few 100 words) as it can be more efficient than BERT & Co.

elliottash commented 3 years ago

For the long documents, one idea would be to use legal opinions, for example from courtlistener.com.

UKPLab / sentence-transformers

Add FAVOR+ / Performer attention support #715