AI-Northstar-Tech / vector-io

Use the universal VDF format for vector datasets to easily export and import data from all vector databases
Apache License 2.0
158 stars 22 forks source link

Create LanceDB index after table is created in import #80

Open dhruv-anand-aintech opened 2 months ago

dhruv-anand-aintech commented 2 months ago
Checklist - [X] Modify `src/vdf_io/import_vdf/lancedb_import.py` ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/f168003cd3994a1082afd1126b665682b0d852f8 [Edit](https://github.com/AI-Northstar-Tech/vector-io/edit/sweep/create_lancedb_index_after_table_is_crea/src/vdf_io/import_vdf/lancedb_import.py) - [X] Modify `src/vdf_io/import_vdf/lancedb_import.py` ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/f168003cd3994a1082afd1126b665682b0d852f8 [Edit](https://github.com/AI-Northstar-Tech/vector-io/edit/sweep/create_lancedb_index_after_table_is_crea/src/vdf_io/import_vdf/lancedb_import.py)
sweep-ai[bot] commented 2 months ago

🚀 Here's the PR! #87

See Sweep's progress at the progress dashboard!
💎 Sweep Pro: I'm using GPT-4. You have unlimited GPT-4 tickets. (tracking ID: a4abad1443)

[!TIP] I can email you next time I complete a pull request if you set up your email here!


Actions (click)


Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/AI-Northstar-Tech/vector-io/blob/9cec7fece241357cabdb153511b13c9c9236fb0a/src/vdf_io/import_vdf/lancedb_import.py#L1-L163 https://github.com/AI-Northstar-Tech/vector-io/blob/9cec7fece241357cabdb153511b13c9c9236fb0a/src/vdf_io/util.py#L1-L503

Step 2: ⌨️ Coding

from lancedb import create_index

# Get the ID column from the parquet file schema
parquet_schema = pq.read_schema(parquet_files[0])
id_column = "id" # Default 
for field in parquet_schema:
    if field.name == ID_COLUMN:
        id_column = field.name
        break

# Create index on the table  
create_index(table, id_column)
tqdm.write(f"Created index on {id_column} for table {new_index_name}")

This code reads the schema of the first parquet file to determine the name of the ID column (defaulting to "id" if not found). It then calls create_index passing the table object and ID column name to create an index on that column.


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/create_lancedb_index_after_table_is_crea.


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description. Something wrong? Let us know.

This is an automated message generated by Sweep AI.