deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Apache License 2.0
16.96k stars 1.85k forks source link

Unable to load pipeline using load_from_yml containing FAISSDocumentStore node #4814

Closed preethampaulose closed 4 months ago

preethampaulose commented 1 year ago

Describe the bug 'm trying to load the yml that was created using the save_to_yaml method. The pipeline was first tested to be working and saved. When I try to load the saved yaml file I'm getting error. The Pipeline consists of FaissDocumentStore, Retriever and Generator.

Error message could not determine a constructor for the tag ',2002:python/object:faiss.swigfaiss.IndexFlat' in "myPipeline.haystack-pipeline.yml", line 5, column 18

Expected behavior Load the pipeline containing all the nodes

Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce

  1. Unzip the attached file
  2. Run the "load saved faiss document.ipynb"

FAQ Check



ZanSara commented 1 year ago

Hey @preethampaulose, you're using a quite outdated Haystack version. Would you mind trying to upgrade it and let us know if the issue is still present in the latest version?

anakin87 commented 1 year ago

While investigating on, I discovered that this issue is still present.

To reproduce

from haystack.pipelines import Pipeline
from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import EmbeddingRetriever

ds = FAISSDocumentStore(faiss_index_path="/home/anakin87/apps/experiments/doc-search/index/my_faiss_index.faiss",

retriever = EmbeddingRetriever(

pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])


Generated YAML

- name: FAISSDocumentStore
    faiss_config_path: /home/anakin87/apps/experiments/doc-search/index/my_faiss_index.json
    faiss_index: !!python/object:faiss.swigfaiss.IndexFlat
      this: !!binary |
      !!! very long binary string !!!
    faiss_index_path: /home/anakin87/apps/experiments/doc-search/index/my_faiss_index.faiss
  type: FAISSDocumentStore
- name: Retriever
    document_store: FAISSDocumentStore
    embedding_model: sentence-transformers/msmarco-distilbert-base-tas-b
    model_format: sentence_transformers
  type: EmbeddingRetriever
- name: query
  - inputs:
    - Query
    name: Retriever
version: 1.19.0rc0

The generated YAML contains a very long binary string (the faiss index). My very first idea to solve this issue is about skipping this field in the save_to_yaml method.

TomAtGithub commented 9 months ago

Hey, I experienced the same problem. I was able to workaround this problem by removing all params except faiss_config_path and faiss_index_path from the FAISSDocumentStore. This makes sense because the FAISSDocumentStore constructor does not allow other params except the two mentioned if faiss_config_path is set (see reference). It seems like the FAISSDocumentStore.save_to_yaml() function does not take care of this rule.

Here is a working yaml (it assumes that a FAISSDocumentStore was indexed and saved before to the ./faiss directory):

- name: FAISSDocumentStore
    faiss_config_path: ./faiss/config.json
    faiss_index_path: ./faiss/index.faiss
  type: FAISSDocumentStore
- name: Retriever
    document_store: FAISSDocumentStore
    embedding_model: sentence-transformers/all-MiniLM-L6-v2
  type: EmbeddingRetriever
- name: Reader
    model_name_or_path: sentence-transformers/all-MiniLM-L6-v2
  type: FARMReader
- name: query
  - inputs:
    - Query
    name: Retriever
  - inputs:
    - Retriever
    name: Reader
version: 1.22.1

I am using version 1.22.1 and not the latest version because of bug #5749