Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
8.91k stars 732 forks source link

Minio + Unstructured + Weaviate #3726

Open naelsen opened 1 week ago

naelsen commented 1 week ago

Hey everyone,

I have 2 questions: Is it possible to establish a connection directly between my self-hosted unstructured API and S3 via a connector ? The goal would be to request the API with s3 url instead of local file, which will allow the API to access my s3 data provider.

Also I need something else but is it possible to set up a streaming unstructured ETL? For example when I add new data in S3 then unstructured process it immediately and could ingest data in an another database (weaviate in my case)

I have one weaviate container, an unstructured-api container running and my minio container that is S3 compatible.

Thank you for your help !