AI-Northstar-Tech / vector-io

Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, backup, re-embed (using any model) or access your vector data from any vector databases or repository.
https://vector-io.com
Apache License 2.0
222 stars 27 forks source link

add : mongodb integration #110

Open vipul-maheshwari opened 1 month ago

vipul-maheshwari commented 1 month ago

MongoDB Integration for VDF Import/Export

✨ Generated with love by Kaizen ❤️

Original Description # Add MongoDB Export Support - ****Purpose:** ** Add support for exporting data from MongoDB to the VDF format. - ****Key Changes:**** - Implemented the `ExportMongoDB` class that inherits from the `ExportVDB` base class. - Added functionality to connect to a MongoDB Atlas database, select a collection, and export the data to Parquet files. - Implemented a method to flatten nested MongoDB documents and handle various BSON data types. - Added support for detecting the vector dimension automatically if not provided. - Integrated the exported data into the VDF metadata. - ****Impact:** ** This change allows users to export data from MongoDB databases to the VDF format, enabling further processing and analysis of the data. > ✨ Generated with love by [Kaizen](https://cloudcode.ai) ❤️
Original Description # Add MongoDB Export Functionality - ******Purpose:** ** ** Introduce a new feature to export data from MongoDB to a specified format. - ******Key Changes:****** - Added `.gitignore` entries for environment and testing files. - Updated `requirements.txt` to include `pymongo`. - Created `mongodb_export.py` for handling MongoDB data export. - Implemented argument parsing for MongoDB connection and export parameters. - Enhanced error handling for MongoDB connection and collection selection. - ******Impact:** ** ** This addition allows users to seamlessly export data from MongoDB, improving data integration capabilities. > ✨ Generated with love by [Kaizen](https://cloudcode.ai) ❤️
Original Description # Add MongoDB Export Support - ********Purpose:** ** ** ** Add support for exporting data from MongoDB databases to the VDF format. - ********Key Changes:******** - Implemented the `ExportMongoDB` class that inherits from the `ExportVDB` base class. - Added functionality to connect to a MongoDB Atlas instance, retrieve data from a specified collection, and export it to Parquet files. - Implemented logic to handle various BSON data types and flatten nested documents. - Added support for detecting the vector dimension automatically if not provided. - Integrated the new MongoDB export functionality into the command-line interface. - ********Impact:** ** ** ** This change will allow users to export data from MongoDB databases to the VDF format, enabling them to leverage the VDF ecosystem for vector search, embeddings, and other machine learning tasks. > ✨ Generated with love by [Kaizen](https://cloudcode.ai) ❤️
Original Description # Add MongoDB Export Functionality - **********Purpose:** ** ** ** ** Add support for exporting data from MongoDB databases to the VDF format. - **********Key Changes:********** - Introduced a new `ExportMongoDB` class that inherits from the base `ExportVDB` class. - Implemented methods to connect to a MongoDB database, fetch data from a specified collection, and export the data to Parquet files. - Added support for handling various BSON data types (ObjectId, Binary, Regex, Timestamp, Decimal128, Code) during the flattening process. - Integrated the new MongoDB export functionality into the command-line interface. - **********Impact:** ** ** ** ** Users can now export data from MongoDB databases to the VDF format, enabling seamless integration with the VDF ecosystem and downstream applications. > ✨ Generated with love by [Kaizen](https://cloudcode.ai) ❤️
Original Description # Add MongoDB Export Functionality - ************Purpose:** ** ** ** ** ** Introduces a new feature to export data from MongoDB into a specified format. - ************Key Changes:************ - Added `.cfg` and environment-related entries to `.gitignore`. - Updated `requirements.txt` to include `pymongo`. - Created `mongodb_export.py` for handling MongoDB data exports. - Implemented argument parsing for MongoDB connection and export parameters. - Enhanced utility functions to support MongoDB-specific data handling. - ************Impact:** ** ** ** ** ** This addition allows users to seamlessly export data from MongoDB, enhancing the tool's versatility. > ✨ Generated with love by [Kaizen](https://cloudcode.ai) ❤️
Original Description # Add MongoDB Export Functionality - **************Purpose:** ** ** ** ** ** ** Introduce functionality to export data from MongoDB to a specified format. - **************Key Changes:************** - Added `.cfg` and environment-related entries to `.gitignore`. - Updated `requirements.txt` to include `pymongo` for MongoDB support. - Implemented `ExportMongoDB` class for handling MongoDB data exports. - Added command-line argument parsing for MongoDB connection and export parameters. - Integrated data flattening and exporting to Parquet format. - **************Impact:** ** ** ** ** ** ** This enhancement allows users to seamlessly export data from MongoDB, improving data integration capabilities. > ✨ Generated with love by [Kaizen](https://cloudcode.ai) ❤️
Original Description # Add MongoDB Export Functionality - ****************Purpose:** ** ** ** ** ** ** ** Adds the ability to export data from a MongoDB database to the VDF format. - ****************Key Changes:**************** - Added a new `ExportMongoDB` class that inherits from the `ExportVDB` base class. - Implemented methods to connect to a MongoDB database, fetch data from a specified collection, and export the data to Parquet files. - Included support for handling various BSON data types (ObjectId, Binary, Regex, Timestamp, Decimal128, Code) during the flattening process. - Added a new `mongodb` subparser to the command-line interface to allow users to specify MongoDB connection details and export options. - ****************Impact:** ** ** ** ** ** ** ** This change will enable users to export data from MongoDB databases to the VDF format, allowing for easier integration with the VDF ecosystem and downstream applications. > ✨ Generated with love by [Kaizen](https://cloudcode.ai) ❤️
Original Description - [ ] export script - [ ] import script
---- > [!IMPORTANT] > Adds MongoDB export functionality with BSON handling and Parquet export in `mongodb_export.py`. > > - **MongoDB Export Integration**: > - Adds `ExportMongoDB` class in `mongodb_export.py` for exporting data from MongoDB. > - Implements `make_parser()` and `export_vdb()` methods for argument parsing and export logic. > - Handles BSON type conversions and data flattening in `flatten_dict()`. > - Exports data to Parquet format with vector dimension detection in `get_data()`. > - **Configuration**: > - Adds `MONGODB` to `DBNames` in `names.py`. > - Updates `db_metric_to_standard_metric` in `util.py` to include MongoDB distance metrics. > - **Dependencies**: > - Adds `pymongo` to `requirements.txt`. > > This description was created by [Ellipsis](https://www.ellipsis.dev?ref=AI-Northstar-Tech%2Fvector-io&utm_source=github&utm_medium=referral) for 6788f900fc2e64c21ba17d05d2844fab454aa712. It will automatically update as commits are pushed.
dhruv-anand-aintech commented 1 month ago

Thanks for contributing to Vector-io!

please also give a short readme or how-to for exporting data from mongo, as it is a bit harder than a normal VectorDB (connection string v/s looking up fields like admin password from the portal). Thanks.

vipul-maheshwari commented 1 month ago

Got you comments! Will do the needful!

vipul-maheshwari commented 1 month ago

@dhruv-anand-aintech

kaizen-bot[bot] commented 1 month ago

🔍 Code Review Summary

All Clear: This commit looks good! 👍

Overview

Useful Commands - **Feedback:** Share feedback on kaizens performance with `!feedback [your message]` - **Ask PR:** Reply with `!ask-pr [your question]` - **Review:** Reply with `!review` - **Update Tests:** Reply with `!unittest` to create a PR with test changes