Closed lwj2001 closed 4 months ago
Hello, your problem seems due to the inconsistency between the dimensions of the embedding model and the vector in the faiss index.
Since we use our pre-built index (using e5 as the embedding) in simple.pipeline.py
, you may not be able to match if you use other embedding model. If you need to use your own embedding model, you need to create a new index.
I have checked your previous issue and found that the retrieval model you used is bge-small with a hidden size of 384, which does not match the hidden size of e5 (768). Therefore, the index of e5 cannot be used.
I have checked your previous issue and found that the retrieval model you used is bge-small with a hidden size of 384, which does not match the hidden size of e5 (768). Therefore, the index of e5 cannot be used.
Great, I have now changed the retrieval model to bce-embedding-base_v1
and changed the reranker model to bce-reranker-base_v1
, with both models having a hidden layer of 768. Do I still need to create a new index?
And, i have meet a new bug:
Traceback (most recent call last):
File "/Data1/home/fanziqi/func_eval/FlashRAG/examples/quick_start/simple_pipeline.py", line 35, in <module>
output_dataset = pipeline.run(test_data,do_eval=True)
File "/Data1/home/fanziqi/func_eval/FlashRAG/flashrag/pipeline/pipeline.py", line 80, in run
retrieval_results = self.retriever.batch_search(input_query)
File "/Data1/home/fanziqi/func_eval/FlashRAG/flashrag/retriever/retriever.py", line 60, in wrapper
results, scores = func(self, query_list, num, True)
File "/Data1/home/fanziqi/func_eval/FlashRAG/flashrag/retriever/retriever.py", line 92, in wrapper
results, scores = self.reranker.rerank(query_list, results)
File "/Data1/home/fanziqi/.conda/envs/flashrag/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Data1/home/fanziqi/func_eval/FlashRAG/flashrag/retriever/reranker.py", line 45, in rerank
assert topk < min([len(docs) for docs in doc_list]), "The number of doc returned by the retriever is less than the topk."
AssertionError: The number of doc returned by the retriever is less than the topk.
but i have set 'retrieval_topk': 1,
The simple pipeline.py
is only for testing the normal operation of the overall process, and the index we provide is just a toy index. If you need to use it to run experimental results, you need to create your own index, otherwise you can temporarily use the ready-made one.
For your new bug, you should set retrieval_topk
larger than rerank_topk
in the config.
retrieval_topk
is used to control the number of documents returned by the retriever, and rerank_topk
is used to control the number of documents retained after rerank.
I understand and thank you very much for your prompt reply!