deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.94k stars 1.85k forks source link

docs: updating DocumentSplitter docstring, adding supported DocumentSores #8270

Closed davidsbatista closed 1 month ago

davidsbatista commented 1 month ago

Checklist

coveralls commented 1 month ago

Pull Request Test Coverage Report for Build 10527111686

Details


Files with Coverage Reduction New Missed Lines %
components/preprocessors/document_splitter.py 1 98.96%
<!-- Total: 1 -->
Totals Coverage Status
Change from base Build 10522538514: 0.0%
Covered Lines: 6975
Relevant Lines: 7732

💛 - Coveralls
anakin87 commented 1 month ago

Sorry, we talked about it offline with David.

DocumentSplitter is and should be compatible with each Document Store we support.

Only some specific metadata produced by this component is not supported (and discarded) by some Document Stores. We should express this clearly.

davidsbatista commented 1 month ago

I've added a single disclaimer explaining that _split_overlap with Chroma is lost and added the same info to the docs. I would merge it now.