Closed DanielGlickmanTAU closed 3 years ago
Hey @DanielGlickmanTAU, did you update the embeddings after loading the DPR retriever? E.g.
document_store.update_embeddings(retriever=retriever)
I did after training. Then I saved the retriever. The embeddings do exists in elastic. For example if I retrieve by query, the documents I get back have a populated embeddings field. But I can not search by embedding. Should I call update embedding again, after each time I load the retriever? Calling update_embeddings the first time took a very long time, so doing so again makes no sense.
Indexing with DPR can take quite some time depending on how many documents you want to index, because you use a BERT model to convert text into vectors. Do you use a GPU? How many documents are you indexing? See the retriever performance in our benchmarks for indexing speeds.
If you train your DPR retriever you should, as @lewtun mentioned correctly, also update the embeddings once more. Can you give some more info on how you trained the DPR model and how you are using it afterwards to update embeddings?
To clarify I did update the embedding after training.
What I am doing is
retriever = DensePassageRetriever(...)
retriever.train(...)
retriever.save(path)
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="",
index="document",return_embedding=True)
document_store.update_embeddings(retriever) #takes a lot of time of course
retriever = DensePassageRetriever.load(path,document_store=document_sotre,...)
retriever.retrieve or document_store.query_by_embedding fails. But the embedding seem to exist, as when I fetch using document_store.query("some text") the documents are returned with embeddings
Hey @DanielGlickmanTAU I believe this has nothing to do with training the DPRRetriever but just saving and loading. Unfortunately I cannot reproduce your issue with neither FAISS nor ES doc store using this code:
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
document_store.delete_all_documents(index="document")
document_store.write_documents(dicts[:3],index="document")
retriever = DensePassageRetriever(document_store=document_store)
document_store.update_embeddings(retriever, index="document")
retriever.save("models/temp")
retriever2 = DensePassageRetriever.load("models/temp",document_store=document_store)
print(retriever2.retrieve("this is a father of arya"))
Are you using latest master? Are you using the correct index when writing, updating and querying?
Hey @DanielGlickmanTAU any updates here? Unfortunately we could not reproduce your issues, so closing now. Feel free to reopen if the error persists. If you found the problem please report back here. Thanks
After training DPR and updating the passages embedding(I am using elasticsearch), mostly following tutorials 6 and 9, when trying to retrieve I am getting the error:
RequestError(400, 'search_phase_execution_exception', 'runtime error')
Snippet from my code: