AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Problem
The nested format we chose for metadata about splitting is problematic for some DBs.
Starting from 2.3.0 we are automatically populating meta with _split_overlap when using the DocumentSplitter.
_split_overlap is a List[dict], unsupported by some DBs: for example, Chroma and Pinecone.
For the time being, it might be appropriate to automatically skip problematic metadata (and emit a warning) for Document Stores that don't support them.
In the long run, we might consider changing the common format or acting locally on the Document Stores to ensure that this information is stored and used correctly.
Problem The nested format we chose for metadata about splitting is problematic for some DBs. Starting from 2.3.0 we are automatically populating
meta
with_split_overlap
when using theDocumentSplitter
._split_overlap
is aList[dict]
, unsupported by some DBs: for example, Chroma and Pinecone.Related issues: https://github.com/deepset-ai/haystack-core-integrations/issues/904 https://github.com/deepset-ai/haystack-core-integrations/issues/919
Solutions