deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.94k stars 1.85k forks source link

Components.from_dict is not recursive #8199

Closed FHardow closed 1 month ago

FHardow commented 1 month ago

Describe the bug Calling from_dict on a serialized pipeline does not deserialize every component, but only itself. Sub components are newly created. E.g. components that receive a document_store as parameter do not call from_dict on the document_store, but initialize a new object. These two functions can differ and change the deserialized pipeline.

We found that problem while using the SentenceWindowRetriever with an OpenSearchDocumentStore. Here is a PR with a test in haystack-core-integrations that shows the document_store was not deserialized correctly.

Error message No error message, only findable with checking the output of the deserialization.

Expected behavior Subcomponents should also be deserialized and not newly initialized.

Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce Steps to reproduce the behavior

FAQ Check

System: