deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.94k stars 1.85k forks source link

feat: adding an hierarchical document builder, #8180

Closed davidsbatista closed 1 month ago

davidsbatista commented 1 month ago

Related Issues

Proposed Changes:

Adds a new builder.HierarchicalDocumentBuilder: it's used to split a Document into multiple Document objects of different block sizes building a hierarchical tree structure where each smaller block is a child of a previous larger block.

Some insight of what's being implemented: https://pbs.twimg.com/media/F7ONuajWMAAvuWh?format=jpg&name=4096x4096

How did you test it?

Notes for the reviewer

I'm pushing this into a feature branch to ease the PR review process, after I will push to the same feature branch the code that implements the auto-merge-retriever and then push the feature-branch to the main branch.

Checklist

anakin87 commented 1 month ago
dfokina commented 1 month ago

Hey @davidsbatista , could you tag me once the work on the component is done so I can review the docstrings? 🙏

davidsbatista commented 1 month ago

@dfokina I will close this PR soon and instead open it agains the haystack-experimental package - but I can still tag you on the experimental repo