NEOS-AI / Neosearch

AI-based search engine done right
Apache License 2.0
7 stars 0 forks source link

Bi-Encoder vs Cross-Encoder #12

Open YeonwooSung opened 1 year ago

YeonwooSung commented 1 year ago

보통 cross-encoder가 더 높은 정확도를 가지나, cross-encoder는 scalability가 좋지 못함.

sentence-transformers는 두 방식 모두 지원

Bi-Encoders (see Computing Sentence Embeddings) are used whenever you need a sentence embedding in a vector space for efficient comparison. Applications are for example Information Retrieval / Semantic Search or Clustering. Cross-Encoders would be the wrong choice for these application: Clustering 10,000 sentence with CrossEncoders would require computing similarity scores for about 50 Million sentence combinations, which takes about 65 hours. With a Bi-Encoder, you compute the embedding for each sentence, which takes only 5 seconds. You can then perform the clustering.

sentence-transformers; cross-encoder vs bi-encoder

YeonwooSung commented 1 year ago

Using Cross-Encoders as reranker in multistage vector search

In search, or semantic matching of sentences, we can see this tradeoff in Bi-Encoder models compared with Cross-Encoder models. Bi-Encoder models are fast, but less accurate, while Cross-Encoders are more accurate, but slow. Luckily, we can combine them in a search pipeline to benefit from both models!