asg017 / sqlite-vec

A vector search SQLite extension that runs anywhere!
Apache License 2.0
4.13k stars 132 forks source link

Enable CI for actions/setup-python@v5 #57

Open franciscojavierarceo opened 3 months ago

franciscojavierarceo commented 3 months ago

I launched support for SQLite Vec in a recent version of Feast but, due to some CI issues, only released it to a subset of Python versions.

I was considering contributing to this project to add a CI to verify the Python package behavior. It would also help the Feast support.

The solution would add a github/worfklow with something like this:

name: unit-tests of Ubuntu and Mac

on: []
jobs:
  unit-test-python:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        python-version: [ "3.9", "3.10", "3.11"]
        os: [ ubuntu-latest, macos-13 ]
        exclude:
            python-version: "3.9"
    env:
      OS: ${{ matrix.os }}
      PYTHON: ${{ matrix.python-version }}
    steps:
      - uses: actions/checkout@v4
      - name: Setup Python
        id: setup-python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          architecture: x64
      - name: Install uv
        run: |
          curl -LsSf https://astral.sh/uv/install.sh | sh
      - name: Get uv cache dir
        id: uv-cache
        run: |
          echo "::set-output name=dir::$(uv cache dir)"
      - name: Install dependencies
        run: pip install sqlite_vec
      - name: run script
        run: python unit_tests.py
franciscojavierarceo commented 3 months ago

@asg017 happy to take this on if you're good with it

asg017 commented 3 months ago

It now should work in all Python versions, if you update your KNN SQL queries to look like this:

select rowid, distance 
from vec_items 
where embedding match ? 
  and k = 10

Instead of:

select rowid, distance 
from vec_items 
where embedding match ? 
limit 10

The limit 10 syntax only works in SQLite versions 3.41+, which older Python versions typically dont have. But the k = 10 syntax should work on all versions of SQLite

I'm a bit hesitant to add a CI rule to test across multiple Python versions, and that can slow down the CI quite a lot. But if the k = 10 syntax doesn't work for you, happy to dig into it further!

franciscojavierarceo commented 3 months ago

Yeah I swapped that syntax but still encountered issues.

Here's the PR I have https://github.com/feast-dev/feast/pull/4333

I tried some changes and now it fails on 3.10 mac instead of 3.11 as well as Ubuntu.

It feels a bit like wack-a-mole which is why I thought adding the CI would help me. I can just make a fork I suppose and have the CI in mine.

franciscojavierarceo commented 3 months ago

The error that's being raised is

E       sqlite3.OperationalError: no such module: vec0
franciscojavierarceo commented 3 months ago

So I tried the latest version (v0.1.0) and created a simple example with the workflow below and there are some interesting issues. It looks like now these issues are associated with build time errors instead of code failures like I reported in the first run.

For what it's worth, here's the current list of OSs where things are failing (mostly installing)

  1. Failed a. 3.9 mac-latest b. 3.10 mac-latest c. 3.11.0-rc.2* mac-13 d. 3.11-rc.2 mac-latest e. 3.12 mac-13 f. 3.12 mac-latest
  2. Passed a. 3.9 ubuntu-latest b. 3.10-ubuntu-latest c. 3.10-mac-latest d. 3.10 mac-13 e. 3.11.0-rc.2 ubuntu-latest f. 3.12 ubuntu-latest

I briefly looked at your test.yaml workflow and noticed you're building python a bit differently so I'll try to see if it has something to do with using actions/setup-python.

name: unit-tests

on:
  pull_request:
  push:
    branches:
      - main
jobs:
  unit-test-python:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        python-version: [ "3.9", "3.10", "3.11", "3.12"]
        os: [ ubuntu-latest, macos-13, macos-latest ]
        exclude:
          - os: macos-13
            python-version: "3.9"
    env:
      OS: ${{ matrix.os }}
      PYTHON: ${{ matrix.python-version }}
    steps:
      - uses: actions/checkout@v4
      - name: Setup Python
        id: setup-python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          architecture: x64
      - name: Install uv
        run: |
          curl -LsSf https://astral.sh/uv/install.sh | sh
      - name: Get uv cache dir
        id: uv-cache
        run: |
          echo "::set-output name=dir::$(uv cache dir)"
      - name: Install dependencies
        run: pip install sqlite_vec==v0.1.0
      - name: run script
        run: python sqlite_vec_demo.py

And the sqlite_vec_demo.py file is just:

import sqlite3
import sqlite_vec

from typing import List
import struct

def serialize_f32(vector: List[float]) -> bytes:
    """serializes a list of floats into a compact "raw bytes" format"""
    return struct.pack("%sf" % len(vector), *vector)

def main() -> None:
    db = sqlite3.connect(":memory:")
    db.enable_load_extension(True)
    sqlite_vec.load(db)
    db.enable_load_extension(False)

    sqlite_version, vec_version = db.execute(
        "select sqlite_version(), vec_version()"
    ).fetchone()

    print(f"sqlite_version={sqlite_version}, vec_version={vec_version}")

    items = [
        (1, [0.1, 0.1, 0.1, 0.1]),
        (2, [0.2, 0.2, 0.2, 0.2]),
        (3, [0.3, 0.3, 0.3, 0.3]),
        (4, [0.4, 0.4, 0.4, 0.4]),
        (5, [0.5, 0.5, 0.5, 0.5]),
    ]
    query = [0.3, 0.3, 0.3, 0.3]

    db.execute("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])")

    with db:
        for item in items:
            db.execute(
                "INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)",
                [item[0], serialize_f32(item[1])],
            )

    rows = db.execute(
        """
          SELECT
            rowid,
            distance
          FROM vec_items
          WHERE embedding MATCH ?
          and k = 3
        """,
        [serialize_f32(query)],
    ).fetchall()

    print(rows)

if __name__ == "__main__":
    main()

*Note I had to use 3.11.0-rc.2 because of this thread.

asg017 commented 3 months ago

Looking at these logs: https://github.com/franciscojavierarceo/Python/actions/runs/10241600790/job/28330119760

Nearly all of the failure have to do with installing Python on github actions runners, and not with sqlite-vec.

But these fail with AttributeError: 'sqlite3.Connection' object has no attribute 'enable_load_extension' https://github.com/franciscojavierarceo/Python/actions/runs/10241600790/job/28330119760

For that: this is a MacOS thing, where recent MacOS versions block loading SQLite extensions on default Python builds. You'll need to use homebrew to install a new Python version that bundles its own SQLite build that allows extensions loading (or some other Python installer, actions/setup-python wont do this for you)

franciscojavierarceo commented 3 months ago

Yeah, that's what I was suggesting, too. Thanks for digging in as well.

I'll raise an issue with SQLite and tag it here. I've renamed this issue so it's more explicit in case someone else tries to do something similar.