Open davidgilbertson opened 2 months ago
@davidgilbertson Sorry you are having a tough time with Unstructured.
If you are using the Serverless API you shouldn't need the pip install "unstructured-ingest[pdf]"
. Since you won't be actually processing those files locally.
Please try this python code here and point it to your api key, api key url, local documents (.pdf) folder, and output directory. (you don't necessarily have to use the environment variables... you can just fill in the values to keep it simpler.)
Feel free to tag me here if you still have an issue.
I'm trying the Serverless API because I couldn't get
unstructured[pdf]
to install (package clashes caused by install an old version of PyTorch).The docs say to use the API I should use
unstructured-ingest
and this page says that if I want to convert a PDF I should dopip install "unstructured-ingest[pdf]"
. Half-expecting this to download the wrong PyTorch again (which takes ages, then ages to reinstall the new one) I thought I'd check the requirements:https://github.com/Unstructured-IO/unstructured-ingest/blob/main/requirements/local_partition/pdf.in
And it looks like that's just going to install
unstructured[pdf]
, the thing I'm trying to avoid!So my question, why does this client library that just calls APIs need to install the whole gigantic
unstructured
package?I tried the sample code without running this install (which breaks my whole environment) and it seems to work.
Some friendly new-user feedback: this is all very difficult! I have a funny feeling that the results are going to be impressive, but my gosh the developer experience is terrible so far.