Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, backup, re-embed (using any model) or access your vector data from any vector databases or repository.
Original Description
# Add MongoDB Export Functionality
- ****Purpose:**
**
Introduce functionality to export data from MongoDB to a specified format.
- ****Key Changes:****
- Added `ExportMongoDB` class for exporting data from MongoDB collections.
- Implemented command-line argument parsing for MongoDB connection details.
- Included methods for data flattening and exporting to parquet format.
- Updated `.gitignore` to exclude testing and environment files.
- Added MongoDB entry to `DBNames` for consistency in naming.
- ****Impact:**
**
This enhancement allows users to seamlessly export data from MongoDB, improving data integration capabilities.
> ✨ Generated with love by [Kaizen](https://cloudcode.ai) ❤️
Original Description
- [ ] Export script
- [ ] Import script
----
> [!IMPORTANT]
> Adds MongoDB export functionality with BSON handling and vector dimension detection, and updates configuration for MongoDB support.
>
> - **Export Functionality**:
> - Adds `ExportMongoDB` class in `mongodb_export.py` for exporting data from MongoDB.
> - Handles BSON types like `ObjectId`, `Binary`, `Regex`, `Timestamp`, `Decimal128`, and `Code` in `flatten_dict()`.
> - Detects vector dimensions if not provided, and exports data in batches to Parquet files.
> - **Configuration**:
> - Adds `MONGODB` to `DBNames` in `names.py`.
> - Updates `db_metric_to_standard_metric` in `util.py` to include MongoDB with `cosine` and `euclidean` distances.
> - **Import Functionality**:
> - Placeholder for MongoDB import in `mongodb_import.py`.
>
> This description was created by [](https://www.ellipsis.dev?ref=AI-Northstar-Tech%2Fvector-io&utm_source=github&utm_medium=referral) for f343642aad87faf412befd105451df8ad90dc997. It will automatically update as commits are pushed.
❗ Attention Required: This push has potential issues. 🚨
Overview
Total Feedbacks: 1 (Critical: 1, Refinements: 0)
Files Affected: 1
Code Quality: [█████████████████░░░] 85% (Good)
🚨 Critical Issues
performance (1 issues)
_ 1. Inefficient handling of large datasets in get_data method._
------
📁 **File:** [src/vdf_io/export_vdf/mongodb_export.py](src/vdf_io/export_vdf/mongodb_export.py#L218)
🔍 **Reasoning:**
The current implementation loads all documents into memory at once using `list(cursor)`, which can lead to high memory usage for large collections.
💡 **Solution:**
Process documents in a streaming manner to reduce memory footprint.
**Current Code:**
```python
batch_data = list(cursor)
```
**Suggested Code:**
```python
for document in cursor:
flat_doc = self.flatten_dict(document)
flattened_data.append(flat_doc)
```
Useful Commands
- **Feedback:** Share feedback on kaizens performance with `!feedback [your message]`
- **Ask PR:** Reply with `!ask-pr [your question]`
- **Review:** Reply with `!review`
- **Update Tests:** Reply with `!unittest` to create a PR with test changes
Add MongoDB Export Functionality
ExportMongoDB
class for handling MongoDB data exports..gitignore
to exclude testing and environment files.DBNames
for future reference.Original Description
# Add MongoDB Export Functionality - ****Purpose:** ** Introduce functionality to export data from MongoDB to a specified format. - ****Key Changes:**** - Added `ExportMongoDB` class for exporting data from MongoDB collections. - Implemented command-line argument parsing for MongoDB connection details. - Included methods for data flattening and exporting to parquet format. - Updated `.gitignore` to exclude testing and environment files. - Added MongoDB entry to `DBNames` for consistency in naming. - ****Impact:** ** This enhancement allows users to seamlessly export data from MongoDB, improving data integration capabilities. > ✨ Generated with love by [Kaizen](https://cloudcode.ai) ❤️Original Description
- [ ] Export script - [ ] Import script