Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
8.67k stars 707 forks source link

Adding a progress bar when partitioning pdfs #2351

Open TheoLvs opened 9 months ago

TheoLvs commented 9 months ago

Hello,

Thanks a lot for the huge work on unstructured !

I would love to visualize with a progress bar the advancement of partition_pdf when parsing big pdfs. Is there a easy way of doing it ?

justindujardin commented 4 months ago

We're using the API for parsing PDFs. It's really troubling not to have progress reported in some way for operations that can take anywhere between 10 seconds and 5 minutes. In our experience, the developer (and by extension user) experience is really janky without progress reporting of some sort.

peili commented 4 months ago

+1 It would significantly improve our ability to handle large PDFs and provide a better user experience.

PJDEVEX commented 3 months ago

Supporting the matter raised by @TheoLvs! It would have been really handy, specially when handling bigger files...