deepset-ai / haystack-core-integrations

Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwards
https://haystack.deepset.ai
Apache License 2.0
105 stars 99 forks source link

feat(unstructured): adding element index as metadata when partitioning by elem #378

Closed lambda-science closed 7 months ago

lambda-science commented 7 months ago

Is your feature request related to a problem? Please describe. With unstructured we can do partition: one doc per file, one doc per page or one doc per elem. When doing one doc per elem there is no way to track orders of element coming from a same doucment. This informations can be usefull for example for a ContextExpander component. Let's say you retrieve an element, you can retrieve also the previous and next element to expand the current context .

Describe the solution you'd like Just automatically add index metadata. I will do a PR.

anakin87 commented 7 months ago

done in #382 new release: https://pypi.org/project/unstructured-fileconverter-haystack/0.4.0/