Tongjilibo / bert4vector

向量计算、存储、检索、相似度计算
Apache License 2.0
7 stars 0 forks source link

英文检索准确度不高的原因 #1

Open zzkrzxx opened 2 weeks ago

zzkrzxx commented 2 weeks ago

英文检索

选取的embedding模型为BAAI/bge-large-en-v1.5,参考examples/faiss_search.py进行英文检索,效果很差,请问一下可能的原因是什么呢?

Tongjilibo commented 1 week ago

如果不使用faiss,而直接用BertSimilariy试试正常吗

from bert4vector.core import BertSimilarity

model = BertSimilarity('/data/pretrain_ckpt/embedding/BAAI--bge-base-en-v1.5')

model.add_corpus(['hello', 'nice to meet you'])
model.add_corpus(['thank you very much', 'i love you'])
model.summary()
print(model.search('hi', topk=2))

以下是输出

+------------------------------------------------+
| name    | size | few_samples                   |
+------------------------------------------------+
| default | 4    | ['hello', 'nice to meet you'] |
+------------------------------------------------+
{'hi': [{'text': 'hello', 'corpus_id': 0, 'score': 0.954134464263916}, {'text': 'nice to meet you', 'corpus_id': 1, 'score': 0.769316554069519}]}