Closed davidgilbertson closed 1 month ago
Hi @davidgilbertson , apologies for the confusion! This project is still active. Our intent with that note was to nudge people towards the ingest tool for batch processing use cases. I'll chat with the team about clarifying the docs. The TLDR is:
unstructured-client
or the Python SDK) is a small http client for interacting with the Unstructured API. It's an easy way to get started sending a few documents to our hosted service, via your existing Python or Typescript projects. It has a few niceties including http retries, and splitting up of large pdfs to reduce latencyunstructured-client
. You can use Ingest to connect a source and destination (local folder, s3 bucket, etc), and efficiently process all the specified documents. Ingest will cache the results, and when you run it again it will process new docs out of your source. This is a more feature rich way to interact with Unstructured.If you have a single pdf that you want to try out, you can copy the example snippet in the readme here. You can just as easily run an ingest command to process the pdf from a source folder, and this may be ideal if you want to test out more docs in the same folder.
Let me know if this helps! We're also happy to chat through your use case on our community slack.
Is this package deprecated?
I'm new to unstructured, and when I go to "Python SDK" in the docs (which I assume is synonymous with "Python client" - there's so many names!) it tells me I shouldn't use this package, and should use the ingest Python library instead.
So is this client package deprecated now, if so should it be stated clearly somewhere, in the readme?
And if not, why are there two package to do the same thing and no explanation (that I've found, yet) as to when I would pick one or the other.
...and then I see this video from 6 days ago using the client package that the docs say to not use.
I'm having a hard time working out what I need to actually install to just get started converting a PDF.