Closed kingafy closed 2 years ago
Hello @kingafy, unfortunately I can't reproduce your error with the information you provide. For me, the following snippet works fine to the end:
from haystack.utils import clean_wiki_text, print_answers, convert_files_to_dicts, launch_es
from haystack import Pipeline
from haystack.document_stores import ElasticsearchDocumentStore
from haystack.nodes import ElasticsearchRetriever, DensePassageRetriever, FARMReader, JoinDocuments
doc_dir = "data/article_txt_got"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
got_dicts = convert_files_to_dicts(
dir_path=doc_dir,
clean_func=clean_wiki_text,
split_paragraphs=True
)
launch_es()
document_store = ElasticsearchDocumentStore()
document_store.delete_documents()
document_store.write_documents(got_dicts)
es_retriever = ElasticsearchRetriever(document_store=document_store)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")
dpr_retriever = DensePassageRetriever(document_store)
document_store.update_embeddings(dpr_retriever, update_existing_embeddings=False)
p_ensemble = Pipeline()
p_ensemble.add_node(component=es_retriever, name="ESRetriever", inputs=["Query"])
p_ensemble.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["Query"])
p_ensemble.add_node(component=JoinDocuments(join_mode="concatenate"), name="JoinResults", inputs=["ESRetriever", "DPRRetriever"])
p_ensemble.add_node(component=reader, name="Reader", inputs=["JoinResults"])
p_ensemble.draw("pipeline_ensemble.png")
query="Who is the father of Arya Stark?"
res = p_ensemble.run(
query="Who is the father of Arya Stark?",
params={"ESRetriever": {"top_k": 2}, "DPRRetriever": {"top_k": 2}},
)
print("\nQuery: ", query)
print("Answers:")
print_answers(res, details="minimum")
Do you have the same issue executing this script? If so, please share:
Can't I use this way of referring two docstore:- document_store_es = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document") es_retriever = ElasticsearchRetriever(document_store=document_store_es)
document_store_dpr = FAISSDocumentStore.load("my_faiss_index_eli5.faiss")
I'm sorry @kingafy but I don't get your question. Sure you can have two docstores, if your pipeline is setup correctly. What error are you getting? You need to provide me a complete code snippet or I can't help you :confused:
Hey @kingafy, did you manage to solve your problem? Looking at the error trace from your initial comment, it seems that there is a mismatch in dimensionality (the assertion assert d == self.d
) fails. What model are you using for embedding your documents and what value did you set for embedding_dim
when initializing your FAISSDocumentStore
?
@kingafy as you haven't answered in a while, I assume you found a solution and close this issue. Feel free to reopen it at any time if the problem still exists.
While trying pipeline for the below piece of code I am getting error:- from haystack.pipelines import JoinDocuments p_ensemble = Pipeline() p_ensemble.add_node(component=es_retriever, name="ESRetriever", inputs=["Query"]) p_ensemble.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["Query"]) p_ensemble.add_node(component=JoinDocuments(join_mode="concatenate"), name="JoinResults", inputs=["ESRetriever", "DPRRetriever"]) p_ensemble.add_node(component=reader, name="Reader", inputs=["JoinResults"])
p_ensemble.draw("pipeline_ensemble.png")
Run pipeline
res = p_ensemble.run( query="Who is the father of Arya Stark?", params={"DPRRetriever": {"top_k": 2}, "ESRetriever": {"top_k": 2}} ) Error:---- Traceback (most recent call last): File "c:\my_projects\haystackrnr\haystack-new\haystack\haystack\pipelines\base.py", line 335, in run node_output, stream_id = self.graph.nodes[node_id]["component"]._dispatch_run(node_input) File "c:\my_projects\haystackrnr\haystack-new\haystack\haystack\nodes\base.py", line 135, in _dispatch_run output, stream = self.run(run_inputs, *run_params) File "c:\my_projects\haystackrnr\haystack-new\haystack\haystack\nodes\retriever\base.py", line 233, in run output, stream = run_query_timed(query=query, filters=filters, top_k=top_k, index=index, headers=headers) File "c:\my_projects\haystackrnr\haystack-new\haystack\haystack\nodes\retriever\base.py", line 77, in wrapper ret = fn(args, **kwargs) File "c:\my_projects\haystackrnr\haystack-new\haystack\haystack\nodes\retriever\base.py", line 250, in run_query documents = self.retrieve(query=query, filters=filters, top_k=top_k, index=index, headers=headers) File "c:\my_projects\haystackrnr\haystack-new\haystack\haystack\nodes\retriever\dense.py", line 215, in retrieve documents = self.document_store.query_by_embedding(query_emb=query_emb[0], top_k=top_k, filters=filters, index=index, headers=headers) File "c:\my_projects\haystackrnr\haystack-new\haystack\haystack\document_stores\faiss.py", line 533, in query_by_embedding score_matrix, vector_id_matrix = self.faiss_indexes[index].search(query_emb, top_k) File "C:\Users\anshuman.a.mahapatra\Anaconda3\envs\haystack_new\lib\site-packages\faiss__init__.py", line 341, in replacement_search assert d == self.d AssertionError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "pipeline.py", line 58, in
params={"DPRRetriever": {"top_k": 2}, "ESRetriever": {"top_k": 2}}