Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
8.64k stars 705 forks source link

feat/tqdm ingest support #3199

Closed rbiseck3 closed 3 months ago

rbiseck3 commented 3 months ago

Description

Add in tqdm support to show progress bar of status of each job when being run. Supported for each mode (serial, async, multiprocess). Also small timing wrapper around jobs to print out how long it took in total.

ryannikolaidis commented 3 months ago

neat! should we add the option to a test for general coverage and not regressing if this is a feature we support?

rbiseck3 commented 3 months ago

@ryannikolaidis given that this change is only visual, not sure how to add this to the ingest tests and validate the log output? But I would love to have a test for it somehow.

ryannikolaidis commented 3 months ago

@ryannikolaidis given that this change is only visual, not sure how to add this to the ingest tests and validate the log output? But I would love to have a test for it somehow.

@rbiseck3

don't want to hold this up, though if you are up for validating, something like this: add &> app.log to the end of a ingest call.

then just validate we see the completion for steps:

partition_completion_present=$(grep -o 'partition:  *100%|' app.log)