Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
9.21k stars 764 forks source link

feat/option to load extraction models once instead of everytime partition pdf function called #3698

Open hasansalimkanmaz opened 1 month ago

hasansalimkanmaz commented 1 month ago

Is your feature request related to a problem? Please describe. I am using partition_pdf function. Everytime, I call the function, it loads the relevant models into memory. This results in significant delays in processing.

Describe the solution you'd like I would like to have a paramter in partition_pdf function to allow loading the model once into the memory throughout the life cycle of the application. This way, there will not be any need to load the model again.

hasansalimkanmaz commented 1 month ago

If you can guide me on how to do it, I may allocate some time on my side to implement this feature depending on how complex it is.