Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.37k stars 572 forks source link

Couchbase vector store support as destination and source connector #3292

Open lokesh-couchbase opened 3 days ago

lokesh-couchbase commented 3 days ago

Couchbase is a Key-Value based NoSQL Database.

This PR intends to add support for couchbase as a source and destination connector to unstructured.io

potter-potter commented 3 days ago

Hey @lokesh-couchbase this looks super cool!

For the tests, could you build them to run off of Docker containers (for both source and destination) instead of using the hosted service.

This will make sure the couchbase connectors are ready to be used in both hosted and open source couchbase.

It will also help us in that we don't have to subscribe to couchbase just to maintain a connector. (the source/destination tests run everytime CI/CD is run.)

You can see some good examples in the elasticsearch, chroma or kafka tests.

https://github.com/Unstructured-IO/unstructured/blob/3f581e6b7d4ce4b45e6fdb58520047441c21d96c/test_unstructured_ingest/dest/elasticsearch.sh#L36

https://github.com/Unstructured-IO/unstructured/blob/main/scripts/elasticsearch-test-helpers/destination_connector/create-elasticsearch-instance.sh