Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.5k stars 585 forks source link

partition_pdf is loading the model at every call #3058

Closed SkanderHellal closed 1 month ago

SkanderHellal commented 1 month ago

I am using partition_pdf to extract content from pdf file. However, object detection models and table structure recognition are loaded at every call. In addition to that, we cannot change models with table extraction for example.

MthwRobinson commented 1 month ago

Thanks for the issue @SkanderHellal . We'll discuss and follow up

amadeusz-ds commented 1 month ago

Hi @SkanderHellal, how did you notice that the model is loaded every time? What are the steps to reproduce?

amadeusz-ds commented 1 month ago

@SkanderHellal I was not able to reproduce this issue - both detection and table transformer models initialized for only a single time. Please share steps to reproduce your issue.