asg017 / sqlite-vec

A vector search SQLite extension that runs anywhere!
Apache License 2.0
4.26k stars 135 forks source link

Support for Pyodide #135

Open alonsosilvaallende opened 2 weeks ago

alonsosilvaallende commented 2 weeks ago

I'm trying to run sqlite-vec with Pyodide (Python distribution for the browser and Node.js based on WebAssembly). According to Pyodide documentation it seems possible to build wheels: https://pyodide.org/en/stable/development/new-packages.html#building-python-wheels-out-of-tree Any python-wasm expert who could give a hand with this issue?

maartenbreddels commented 2 weeks ago

I got this working at:

https://py.cafe/maartenbreddels/sqlite-vec-demo

I tried to keep a bit of a log, so this can easily be reproduced

One ingredient is this, sqlite with extensions enabled (i put that wheel already in the above PyCafe project):

I manually patched the loadable_path to workaround an emscripten issue:

def loadable_path():
  """ Returns the full path to the sqlite-vec loadable SQLite extension bundled with this package """

  loadable_path = path.join(path.dirname(__file__), "vec0")
  import sys
  if sys.platform == "emscripten":
    # on emscripten, without this, it will try to load /some/path/vec0 (which will fail)
    # and /some/path/vec0.so, which will succeed.
    # However, if executed again, /some/path/vec0 will seem to load (some internal datastructure
    # with {exports: 'loading'} can be seen in the debugger, which will cause sqlite3 to think
    # that it can actually load /some/path/vec0. If will then try to get a symbol 'sqlite3_vec_init'
    # which will fail.
    loadable_path += ".so"
  return path.normpath(loadable_path)

Create a fake _sqlite3.so so that when loading vec0.so, it will first load _sqlite3.so and see it symbols

touch dummy.c
emcc -s SIDE_MODULE=1 -o lib_sqlite3.so dummy.c

Compile sqlite-vec.c with the same flags as in https://github.com/pyodide/pyodide/pull/5173

emcc -c ./sqlite-vec.c -o ./sqlite-vec.o -fPIC -DSQLITE_CORE  -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_FDATASYNC=1 -DHAVE_USLEEP=1 -DHAVE_LOCALTIME_R=1 -DHAVE_GMTIME_R=1 -DHAVE_DECL_STRERROR_R=1 -DHAVE_STRERROR_R=1 -DHAVE_POSIX_FALLOCATE=1 -DSQLITE_ENABLE_MATH_FUNCTIONS=1 -DSQLITE_ENABLE_FTS4=1 -DSQLITE_ENABLE_FTS5=1 -DSQLITE_ENABLE_RTREE=1 -DSQLITE_ENABLE_GEOPOLY=1 -DSQLITE_OMIT_POPEN=1 -DSQLITE_THREADSAFE=0 -g3

(Not sure the -g3 was needed)

Create a side module with WASM_BIGINT otherwise we get issues with resolving sqlite3_realloc64 (Error message was something like "imported function does not match the expected type").

emcc ./sqlite-vec.o -o package/sqlite_vec/vec0.so  -s SIDE_MODULE=1 -g3 -s WASM_BIGINT=1 ./_sqlite3.so

Manually zip the package:

zip ../../sqlite_vec-0.1.3-cp312-cp312-pyodide_2024_0_wasm32.whl -r *
  adding: sqlite_vec/ (stored 0%)
  adding: sqlite_vec/__init__.py (deflated 59%)
  adding: sqlite_vec/vec0.dylib (deflated 67%)
  adding: sqlite_vec/vec0.so (deflated 55%)
  adding: sqlite_vec-0.1.3.dist-info/ (stored 0%)
  adding: sqlite_vec-0.1.3.dist-info/RECORD (deflated 35%)
  adding: sqlite_vec-0.1.3.dist-info/WHEEL (deflated 5%)
  adding: sqlite_vec-0.1.3.dist-info/top_level.txt (stored 0%)
  adding: sqlite_vec-0.1.3.dist-info/METADATA (deflated 24%)
hoodmane commented 2 weeks ago

I manually patched the loadable_path to workaround an emscripten issue:

Would be interested in a reproduction for this.

maartenbreddels commented 2 weeks ago

I manually patched the loadable_path to workaround an emscripten issue:

Would be interested in a reproduction for this.

https://github.com/pyodide/pyodide/issues/5175 :)