AI-Northstar-Tech / vector-io

Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, backup, re-embed (using any model) or access your vector data from any vector databases or repository.
https://vector-io.com
Apache License 2.0
217 stars 27 forks source link

Sweep: Add Support for Turbopuffer #81

Closed dhruv-anand-aintech closed 6 months ago

dhruv-anand-aintech commented 6 months ago

Documentation for Turbopuffer sdk: https://turbopuffer.com/docs/

  1. add turbopuffer[fast] to requirements.txt

  2. Upsert code:

    
    import turbopuffer as tpuf

ns = tpuf.Namespace('namespace-name')

If an error occurs, this call raises a tpuf.APIError if a retry was not successful.

ns.upsert( ids=[1, 2, 3, 4], vectors=[[0.1, 0.1], [0.2, 0.2], [0.3, 0.3], [0.4, 0.4]], attributes={ 'my-string': ['one', None, 'three', 'four'], 'my-uint': [12, None, 84, 39], 'my-string-array': [['a', 'b'], ['b', 'd'], [], ['c']] distance_metric='cosine_distance' )


3. Export code:

import turbopuffer as tpuf

ns = tpuf.Namespace('namespace-name')

Cursor paging is handled automatically by the Python client

If an error occurs, this call raises a tpuf.APIError if a retry was not successful.

for row in ns.vectors(): print(row)

VectorRow(id=1, vector=[0.1, 0.1], attributes={'key1': 'one', 'key2': 'a'})

VectorRow(id=2, vector=[0.2, 0.2], attributes={'key1': 'two', 'key2': 'b'})

VectorRow(id=3, vector=[0.3, 0.3], attributes={'key1': 'three', 'key2': 'c'})

VectorRow(id=4, vector=[0.4, 0.4], attributes={'key1': 'four', 'key2': 'd'})



Follow the guidelines at [AI-Northstar-Tech/vector-io#adding-a-new-vector-database](https://github.com/AI-Northstar-Tech/vector-io#adding-a-new-vector-database) to implement support for Turbopuffer in Vector-io

Follow the additions in PR: https://github.com/AI-Northstar-Tech/vector-io/pull/77

## Checklist of features for completion

- [ ] Add mapping of distance metric names
- [ ] Support local and cloud instances
- [ ] Automatically create Python classes for index being exported
- [ ] Export
    - [ ] Get all indexes by default
    - [ ] Option to Specify index names to export
    - [ ] DB-specific command line options (make_parser)
    - [ ] Allow input on terminal for each option above (via input() in python) export_vdb
    - [ ] Handle multiple vectors per row
- [ ] Import
    - [ ] DB-specific command line options (make_parser)
    - [ ] Handle multiple vectors per row
    - [ ] Allow input on terminal for each option above (via input() in python) export_vdb

<details open>
<summary>Checklist</summary>

- [X] Modify `requirements.txt` ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/27aed811cc83300be65b567a92323f86bdd37523 [Edit](https://github.com/AI-Northstar-Tech/vector-io/edit/sweep/add_support_for_turbopuffer/requirements.txt)
- [X] Modify `src/vdf_io/names.py` ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/27aed811cc83300be65b567a92323f86bdd37523 [Edit](https://github.com/AI-Northstar-Tech/vector-io/edit/sweep/add_support_for_turbopuffer/src/vdf_io/names.py)
- [X] Modify `src/vdf_io/util.py` ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/27aed811cc83300be65b567a92323f86bdd37523 [Edit](https://github.com/AI-Northstar-Tech/vector-io/edit/sweep/add_support_for_turbopuffer/src/vdf_io/util.py)
- [X] Modify `src/vdf_io/util.py` ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/27aed811cc83300be65b567a92323f86bdd37523 [Edit](https://github.com/AI-Northstar-Tech/vector-io/edit/sweep/add_support_for_turbopuffer/src/vdf_io/util.py)
- [X] Create `src/vdf_io/turbopuffer.py` ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/27aed811cc83300be65b567a92323f86bdd37523 [Edit](https://github.com/AI-Northstar-Tech/vector-io/edit/sweep/add_support_for_turbopuffer/src/vdf_io/turbopuffer.py)
</details>
sweep-ai[bot] commented 6 months ago

🚀 Here's the PR! #85

See Sweep's progress at the progress dashboard!
💎 Sweep Pro: I'm using GPT-4. You have unlimited GPT-4 tickets. (tracking ID: 37ba588dd3)
Install Sweep Configs: Pull Request

[!TIP] I can email you next time I complete a pull request if you set up your email here!


Actions (click)


Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/AI-Northstar-Tech/vector-io/blob/e5d56ac0eade27e6de4f636b56604ce34e183c47/requirements.txt#L1-L33 https://github.com/AI-Northstar-Tech/vector-io/blob/e5d56ac0eade27e6de4f636b56604ce34e183c47/src/vdf_io/names.py#L1-L12 https://github.com/AI-Northstar-Tech/vector-io/blob/e5d56ac0eade27e6de4f636b56604ce34e183c47/src/vdf_io/util.py#L1-L502
I also found that you mentioned the following Pull Requests that may be helpful:
The following PRs were mentioned in the issue: # Pull Request #77 ## Title: lancedb support ## Files changed: README.md src/vdf_io/export_vdf/lancedb_export.py src/vdf_io/import_vdf/lancedb_import.py src/vdf_io/notebooks/lance-qs.ipynb Be sure to follow the PRs as a reference when making code changes. If the user instructs you to follow the referenced PR, limit the scope of your changes to the referenced PR.

Step 2: ⌨️ Coding

DBNames.TURBOPUFFER: { "cosine_distance": Distance.COSINE, "euclidean_distance": Distance.EUCLID, "dot_product": Distance.DOT, }

Import the necessary modules: import turbopuffer as tpuf from vdf_io.names import DBNames from vdf_io.util import standardize_metric, clean_documents

Implement the make_parser function to add Turbopuffer specific command line options for export and import.

Implement the export_vdb function:

Implement the import_vdb function:

In the export_vdb and import_vdb functions, use the input() function to interactively prompt the user for Turbopuffer specific options that were not provided via command line arguments.


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/add_support_for_turbopuffer.


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description. Something wrong? Let us know.

This is an automated message generated by Sweep AI.