Closed 4ut0m8NT closed 4 months ago
Hello @4ut0m8NT, we usually don't monitor closed issues.
Does this help? https://github.com/deepset-ai/haystack/issues/3961#issuecomment-1406213631
Thanks @anakin87, but this isn't a syntax issue:
document_store = FAISSDocumentStore.load(index_path="my_faiss", config_path="my_faiss.json")
it produces the "ValueError: The number of documents in the SQL database (96) doesn't " if the DB or index exists...
Please advise.
document_store = FAISSDocumentStore(faiss_config_path="./my_faiss.json", faiss_index_path="./my_faiss")
Also a Fail. Please advise.
Yes I get this as well. If I blow away the index and config files it will work just fine, the FAISS DocumentStore. However the save and load process no longer works.
Ok so I think that the tutorial which I was following at https://haystack.deepset.ai/integrations/faiss-document-store to use FAISS to perform semantic search needs to be updated because it does not show the process of saving the DocumentStore. I was performing save(), but I did not do update_embeddings() which was the crucial part I was missing. And then of course you have to update_embeddings() first and save() second so that the counts do match when you go to save.
The tutorial has two parts: the indexing pipeline followed by the query pipeline. The indexing pipeline sets up the FAISSDocumentStore and indexes. After this indexing is complete and before we run the query pipeline, that is where the update_embeddings() needs to be performed. I was anticipating it would be done during the indexing pipeline, however it is after we created the EmbeddingRetriever as part of the query pipeline, that is where the update_embeddings is run() and the save() performed. And I think for normal usage you would want to save and not just rerun this code over and over again and so that is why this process should be mentioned in the tutorial.
Initializing a FAISSDocumentStore can take 'faiss_index' and can also take 'index' If initializing with 'index', I also got the mismatched count error. I checked the code, the index param is ignored. So seems there's an issue with the docs and confusing naming in the params
Describe the bug Loading existing FAISS document store with saveed index/config no longer functions in 1.18.1
It will run once. Work, perform Q/A. Reload = FAIL.
Error message ValueError: The number of documents in the SQL database (96) doesn't match the number of embeddings in FAISS (0). Make sure your FAISS configuration file points to the same database that you used when you saved the original index.
Expected behavior Q/A App Loads and works just like first run.
Additional context Test Doc = converted PDF.
PreProcessing: converter = PDFToTextConverter(remove_numeric_tables=True)
doc_pdf = converter.convert(file_path="data/preprocessing_tutorial/bert.pdf", meta=None)
To Reproduce Use farm-haystack 1.18.1
Run an embedded retriever with 384.
Attempt to reload a 2nd time.
FAQ Check
System: OS: Ubuntu GPU/CPU: GPU Haystack version (commit or version number): 1.18.1 DocumentStore: FAISSDocumentStore Reader: deepset/deberta-v3-base-injection Retriever: EmbeddingRetriever - sentence-transformers/all-MiniLM-L6-v2 (requires 384 dim)
my_faiss.json: {"faiss_index_factory_str": "Flat", "embedding_dim": 384, "index": "documents", "similarity": "cosine", "embedding_field": "question_emb", "sql_url": "sqlite:///faiss_document_store.db"}
my_faiss (index) (binary): "IxFI�^A^@^@^@^@^@^@^@^@^@^@^@^@^P^@^@^@^@^@^@^@^P^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@"
Please advise.
Also added to closed ticket #1019 .