Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.44k stars 580 forks source link

FEAT: Astra DB Source Connector Support #3212

Open erichare opened 2 weeks ago

erichare commented 2 weeks ago

This Pull Request adds support for Astra DB as a source connector. The idea being that some collections in Astra DB may not be vector collections. Here, we can pull structured data, and use the destination connector to produce a structured table with vector support. CC @potter-potter

potter-potter commented 2 weeks ago

@erichare I'll check this out this week. Thanks!

erichare commented 1 week ago

Hi @potter-potter , just wondering would it be helpful to you if i merged in the changes from https://github.com/Unstructured-IO/unstructured/pull/3179 into this PR? I can do so and close the other PR if so!

potter-potter commented 1 week ago

@erichare This looks like it's pretty close to being done.

erichare commented 1 week ago

@potter-potter pushed some updates just now based on the feedback! Thanks so much.

EDIT: Also just wanted to check if you want me to just merge in the changes from https://github.com/Unstructured-IO/unstructured/pull/3179 into this one, and close 3179? just let me know.

potter-potter commented 1 week ago

@erichare You can merge in the changes from 3179.

erichare commented 1 week ago

@erichare You can merge in the changes from 3179.

Thanks @potter-potter ! I've done that, made the update to the output paths, and did some little cleanup of the imports. Let me know if this looks good or if there's anything else to address. Thanks so much for the reviews.

potter-potter commented 1 week ago

@erichare This all looks good. I'll take over and get it into PR today or tomorrow. Nice work!

erichare commented 1 week ago

@erichare This all looks good. I'll take over and get it into PR today or tomorrow. Nice work!

Thanks @potter-potter !