Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.8k stars 626 forks source link

feat(docx): add pluggable picture sub-partitioner #3081

Closed scanny closed 2 months ago

scanny commented 2 months ago

Summary Allow registration of a custom sub-partitioner that extracts images from a DOCX paragraph.

Additional Context