electric-sql / pglite

Lightweight Postgres packaged as WASM into a TypeScript library for the browser, Node.js, Bun and Deno
https://electric-sql.com
Apache License 2.0
4.76k stars 81 forks source link

Add PGVector Support #95

Open e253 opened 1 month ago

e253 commented 1 month ago

Hello PGlite maintainers,

I have pgvector linked statically same as plpgsql. I'm convinced despite this that dlopen cannot work in wasm. I gave the runtime linking approach considerable effort before capitulating back to trash-compacting all the objects in one step.

Some things you'll want to look at:

  1. I had to change pg_config.h --> PG_VERSION_NUM to 140000 for the pgvector build to finish without problems. This change is reverted after the build:ext step is finished.
  2. I haven't added tests for core pgvector functionality. I tested CREATE EXTENSION vector in a bun repl.
  3. I added more patches to the postgres.js file to fix problems with the postgres build on Ubuntu 22.04 WSL and debian:bookworm. I'm surprised this wasn't a problem before?
  4. I added the ability to set node exactly in the Makefile under packages/pglite becuase I had some permissions problems with what I think was the node binary under emsdk. Maybe that was misdiagnosed and that change can be reverted.

(Hopes to close #18 and help #19)

Corresponding PR in the PG submodule: https://github.com/electric-sql/postgres-wasm/pull/10.

pmp-p commented 3 weeks ago

dlopen works fine in emscripten, not wasi (but wasi is not currently a target). The key to dynamic linking pg extensions is to pass them -sSIDE_MODULE=1 (using postgres linker script) while pg core is linked with -sMAIN_MODULE=1 and to precompile the wasm asynchronously if extension (.so) size is > 8 MiB, or 32KiB for old v8 engine based browser.

e253 commented 3 weeks ago

I couldn't get Postgres to build with -sMAIN_MODULE=1. It complained about missing _malloc, which is odd given that MAIN_MODULE=1 exports all symbols. Maybe you have some idea of why that problem arose?

pmp-p commented 3 weeks ago

There are ways to avoid using malloc on javascript side, malloc on each query/result may lead to fragmentation. To export libc malloc it would have to be in -sEXPORTED_FUNCTIONS, while the emsdk facilities like stringToNewUTF8,stringToUTF8OnStack would have to be in -sEXPORTED_RUNTIME_METHODS arrays