Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.44k stars 580 forks source link

rfctr: implement mongodb v2 destination connector #3313

Closed vangheem closed 23 hours ago

vangheem commented 5 days ago

This PR provides support for V2 mongodb destination connector.

potter-potter commented 5 days ago

For some reason we are losing "Type" in the "upload_stage" document. Its in the embedded doc. I believe this is important metadata but could be wrong.

Also, the second part of the ingest test tests against a vector search. I think the file it tests again is now in the workdir so maybe try pointing it to that. I don't think we are writing to structured-output anymore for something that is going into a database.

rbiseck3 commented 5 days ago

Once you get the CI tests passing, this migration looks pretty sound