deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.73k stars 1.83k forks source link

Dense Passage Retrieval Fails to retrieve from elasticsearch. RequestError(400, 'search_phase_execution_exception', 'runtime error') #944

Closed DanielGlickmanTAU closed 3 years ago

DanielGlickmanTAU commented 3 years ago

After training DPR and updating the passages embedding(I am using elasticsearch), mostly following tutorials 6 and 9, when trying to retrieve I am getting the error: RequestError(400, 'search_phase_execution_exception', 'runtime error')

Snippet from my code:

document_store=ElasticsearchDocumentStore(host="localhost", username="", password="",
                                            index="document",return_embedding=True)
retriever = DensePassageRetriever.load(
        load_dir=load_dir,
        document_store=document_store,
        max_seq_len_query=64,
        max_seq_len_passage=256,
        embed_title=True) 

retriever.retrieve(query)' <-- fails here`
lewtun commented 3 years ago

Hey @DanielGlickmanTAU, did you update the embeddings after loading the DPR retriever? E.g.

document_store.update_embeddings(retriever=retriever)
DanielGlickmanTAU commented 3 years ago

I did after training. Then I saved the retriever. The embeddings do exists in elastic. For example if I retrieve by query, the documents I get back have a populated embeddings field. But I can not search by embedding. Should I call update embedding again, after each time I load the retriever? Calling update_embeddings the first time took a very long time, so doing so again makes no sense.

Timoeller commented 3 years ago

Indexing with DPR can take quite some time depending on how many documents you want to index, because you use a BERT model to convert text into vectors. Do you use a GPU? How many documents are you indexing? See the retriever performance in our benchmarks for indexing speeds.

If you train your DPR retriever you should, as @lewtun mentioned correctly, also update the embeddings once more. Can you give some more info on how you trained the DPR model and how you are using it afterwards to update embeddings?

DanielGlickmanTAU commented 3 years ago

To clarify I did update the embedding after training.

What I am doing is

retriever = DensePassageRetriever(...)
retriever.train(...)
retriever.save(path)
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="",
                                            index="document",return_embedding=True)
document_store.update_embeddings(retriever) #takes a lot of time of course 

retriever = DensePassageRetriever.load(path,document_store=document_sotre,...)

retriever.retrieve or document_store.query_by_embedding fails. But the embedding seem to exist, as when I fetch using document_store.query("some text") the documents are returned with embeddings

Timoeller commented 3 years ago

Hey @DanielGlickmanTAU I believe this has nothing to do with training the DPRRetriever but just saving and loading. Unfortunately I cannot reproduce your issue with neither FAISS nor ES doc store using this code:

    document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
    document_store.delete_all_documents(index="document")
    document_store.write_documents(dicts[:3],index="document")

    retriever = DensePassageRetriever(document_store=document_store)
    document_store.update_embeddings(retriever, index="document")

    retriever.save("models/temp")
    retriever2 = DensePassageRetriever.load("models/temp",document_store=document_store)

    print(retriever2.retrieve("this is a father of arya"))

Are you using latest master? Are you using the correct index when writing, updating and querying?

Timoeller commented 3 years ago

Hey @DanielGlickmanTAU any updates here? Unfortunately we could not reproduce your issues, so closing now. Feel free to reopen if the error persists. If you found the problem please report back here. Thanks