Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
9.25k stars 767 forks source link

rfctr(part): add new decorator to replace four #3650

Closed scanny closed 2 months ago

scanny commented 2 months ago

Summary In preparation for pluggable auto-partitioners, add a new metadata decorator to replace the four existing ones.

Additional Context "Global" metadata items, those applied to all element on all partitioners, are applied using a decorator.

Currently there are four decorators where there only needs to be one. Consolidate those into a single metadata decorator. One or two additional behaviors of the new decorator will allow us to remove decorators from delegating partitioners which is a prerequisite for pluggable auto-partitioners.

Coniferish commented 2 months ago

What are the four decorators this is replacing? I'm only seeing three that are used on the partitioners:

Also, I'm not seeing the chunk addressed in this decorator. Is that coming in a later PR?