deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.72k stars 1.92k forks source link

`_split_overlap` in `meta` is incompatible with some Document Stores #8181

Closed anakin87 closed 2 months ago

anakin87 commented 3 months ago

Problem The nested format we chose for metadata about splitting is problematic for some DBs. Starting from 2.3.0 we are automatically populating meta with _split_overlap when using the DocumentSplitter. _split_overlap is a List[dict], unsupported by some DBs: for example, Chroma and Pinecone.

Related issues: https://github.com/deepset-ai/haystack-core-integrations/issues/904 https://github.com/deepset-ai/haystack-core-integrations/issues/919

Solutions