Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.03k stars 532 forks source link

Lightweight installation unstructured[pdf] ????? #2976

Open liturrig opened 3 weeks ago

liturrig commented 3 weeks ago

Hello, Is there a way to install the library unstructured[pdf] in lightweight format just to use "fast" strategy without all other dependencies? Thank you in advance for your support.

scanny commented 3 weeks ago

Hi @liturrig, unstructured does not currently have a "pdf-fast-only" install option.

Can you say a bit more about the your use case and why you want something like that?

mszpulak commented 3 weeks ago

Why does it install nvidia libs ? When I added ["pdf"] docker image size increased to 6GB from 600MB before. That's insane.

NathanAP commented 3 weeks ago

Why does it install nvidia libs ? When I added ["pdf"] docker image size increased to 6GB from 600MB before. That's insane.

Thats probably one of the biggest the reason why they created their own API. Our project' size is really big as well.