asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

Bindings for Elixir #21

Open asg017 opened 1 year ago

asg017 commented 1 year ago

I'd like to bundle sqlite-vss as an Elixir package and distribute it on hex. Ideally Elixir developers should be able to run:

mix deps.get sqlite-vss

Existing Language Binding notes

The Python/Node.js bindings (and the future Ruby binding I'm working on) all do similar things:

  1. Bundle pre-compiled loadable extension files (.dylib/.so/.dll) inside the language package (the .wheel file for python, the bundled .tar.gz package for Node.js, and the .gem file for Ruby
  2. Distribute different version of the package for different platforms. For Python this means different wheels per platform, in Node different optionalDependencies, and in Ruby it's building different .gem files with gem bundle --platform x86_64-darwin etc
  3. The language code itself is typically just a single loadable_path function that returns the path to the compiled extension. Also maybe a load() function that wraps a sqlite3 db connection that just calls db.load_extension(loadable_path()) like the python api

Open questions

  1. How do we publish different platforms for the same package? Can we publish a single sqlite-vss package that has different "wheels" for all our platforms, or do we need to publish separate sqlite-vss-macos-x86_64/sqlite-vss-linux-x86_64 packages for every platform?
  2. Can we just stuff the loadable extensions in def project? This seems to work for me:
  def project do
    [
      app: :hello_world,
      version: "0.1.0",
      elixir: "~> 1.12",
      start_permanent: Mix.env() == :prod,
      deps: deps(),
      files: ["priv/*.dylib"]
    ]
  end

Specifically the files: ["priv/*.dylib"] part. How can we make it work for multiple platforms?

asg017 commented 1 year ago

cc @trodrigu

trodrigu commented 1 year ago

For more context: The exqlite lib similar to the python lib can load extensions like

alias Exqlite.Basic
{:ok, conn} = Basic.open("db.sqlite3")
:ok = Basic.enable_load_extension(conn)

# load the regexp extension - https://github.com/nalgeon/sqlean/blob/main/docs/re.md
Basic.load_extension(conn, ExSqlean.path_for("re"))

To make this work with ecto we have to do something like


config :sqlite_init, SqliteInit.Repo,
  database: Path.expand("../sqlite_init_dev.db", Path.dirname(__ENV__.file)),
  pool_size: 5,
  after_connect: fn _conn ->
    [db_conn] = Process.get(:"$callers")
    db_connection_state = :sys.get_state(db_conn)
    conn = db_connection_state.mod_state.state
    IO.inspect(conn, label: "conn")
    :ok = Exqlite.Basic.enable_load_extension(conn)
    Exqlite.Basic.load_extension(conn, ExLitedb.path_for("re"))
  end,
  stacktrace: true,
  show_sensitive_data_on_connection_error: true

@asg017 Are we feeling that loading the extension into ecto (the defacto Elixir orm) is outside the scope of what the lib should do?

trodrigu commented 1 year ago

priv is great for storing the compiled files. Exsqlean takes this approach.

trodrigu commented 1 year ago

Instead of loading it in the project it uses the helper :code.priv

trodrigu commented 1 year ago

The thing we can update from that library though is changing the "GET" request it does for the compiled files and bundle them in the package.

asg017 commented 1 year ago

Thanks for the notes @trodrigu ! Some thoughts:

As far as next steps: If you put up a PR that just gets the skeleton of what the mix.exsand the library Elixir code looks like, I can pick that up and figure out all the priv/ bundling. It'll be done in Github Actions (so nothing like the Makefile in ex_sqlean), so you can just assume that the platform-specific loadable extensions will be bundled in on publish.

For practicality, I may copy the Elixir code and test it with sqlite-hello at first: Each release of sqlite-vss takes like 20 minutes to build and release, but sqlite-hello is much more slimed down and takes ~15 seconds. And every typo/fix needs to be it's own separate release, so it takes a few tries to get right.

Let me know what you think!

trodrigu commented 1 year ago

I love this plan! ❤️ I'm posting a couple of places to see if this is possible.

trodrigu commented 1 year ago

Hey @asg017 !

I wanted to send you a reference to the Elixir Slack link where I’m asking about packages and architectures. Feel free to join at elixir-lang.slack.com. The gist of it is that the recommended pattern is to go with downloadable binaries and use certfile validation and checksum validation. Elixir even has a special library called Rustler Precompiled which makes this process simpler for Rust crates. An example of the checksum file (which is present in the package) is at https://preview.hex.pm/preview/explorer/0.5.7/show/checksum-Elixir.Explorer.PolarsBackend.Native.exs. The bundled package route would require users to add some code to the deps() function in the mix.exs file which could be a little unwieldy. I wanted to get your opinion on this as we may be all good to go with downloadable binaries.

asg017 commented 1 year ago

Thanks for the notes @trodrigu! The download at install-time + cerfile verification approach sounds good to me.

So I imagine we'll have some checksum.exs file that looks like this:

%{
  "sqlite-vss-elixir-macos-x86_64.tar.gz" => "sha256:d36805150e...2c3d25c7e",
  "sqlite-vss-elixir-macos-aarch.tar.gz" => "sha256:50383f244c...50383f244c",
  "sqlite-vss-elixir-linux-x86_64.tar.gz" => "sha256:3e61118a...8443b34d"
}

Some questions:

If you're able to get a skeleton elixir package PR up, don't worry about the checksums.exs file, I think we can generate that at release-time in a github actions runner. But I'm not sure what the mix.exs or the lib/sqlite_vss.ex files should look like

trodrigu commented 1 year ago

When exactly does the "GET" request to download these packages happen? Is it when a user adds sqlite_vss to their deps() and runs mix hex.install, or at runtime when sqlite_vss is ran for the first time?

The former where a user will run a mix sqlite_vss.install command.

Likewise, does the checksum check also happen at install-time?

Exactly! The approach would emulate the Rustler precompiled project.

For the actual GET request, does that require any additional dependencies, or is there one built into elixir we can use? And is certfile a builtin package?

The approach will use :httpc which is built in to Elixir. For the cacertfile we can use the :castore library which makes keeping your castore up to date easier.

trodrigu commented 1 year ago

For reference https://github.com/asg017/sqlite-vss/pull/33

bkono commented 1 year ago

For more context: The exqlite lib similar to the python lib can load extensions like

alias Exqlite.Basic
{:ok, conn} = Basic.open("db.sqlite3")
:ok = Basic.enable_load_extension(conn)

# load the regexp extension - https://github.com/nalgeon/sqlean/blob/main/docs/re.md
Basic.load_extension(conn, ExSqlean.path_for("re"))

To make this work with ecto we have to do something like


config :sqlite_init, SqliteInit.Repo,
  database: Path.expand("../sqlite_init_dev.db", Path.dirname(__ENV__.file)),
  pool_size: 5,
  after_connect: fn _conn ->
    [db_conn] = Process.get(:"$callers")
    db_connection_state = :sys.get_state(db_conn)
    conn = db_connection_state.mod_state.state
    IO.inspect(conn, label: "conn")
    :ok = Exqlite.Basic.enable_load_extension(conn)
    Exqlite.Basic.load_extension(conn, ExLitedb.path_for("re"))
  end,
  stacktrace: true,
  show_sensitive_data_on_connection_error: true

@asg017 Are we feeling that loading the extension into ecto (the defacto Elixir orm) is outside the scope of what the lib should do?

Just wanted to point this PR out ... I actually committed this back to exqlite specifically to enable usage of this lib in a more convenient way. The after_connect stuff is not needed, just a load_extensions key at the repo or global config. If the package directly exposes a path to the embedded libs, this becomes as simple as adding the hex package, and adding the load_extensions block with the helper as the path. See this comment for an example. I was originally thinking about shipping a hex package for exactly the same purpose.

Haven't been through the full comment thread here yet, apologies if I missed something.

trodrigu commented 1 year ago

@bkono Thanks for pointing this out! load_extensions is the way to go and I'll be refactoring my production app to use that API. @asg017 I wasn't sure if we wanted to keep this as bare bones as possible and not include a reference to Ecto in the README?

asg017 commented 1 year ago

@trodrigu we can include an Ecto reference/code snippet in the README!

asg017 commented 1 year ago

kk @trodrigu #33 is merged and sqlite_vss is now published, can you let me know if it works as expected?

A few more lingering questions I have:

  1. Does it make sense to move def install() from lib/sqlite_vss.ex to mix/tasks/sqlite_vss.install.ex? I'd ideally like to keep sqlite_vss.ex as small as possible and only contain def loadable_path_vector0() and def loadable_path_vss0(), but I'm not sure if the rest has to be in sqlite_vss.ex
  2. Do you think it's worth doing the sqlite-vss-checksum.exs checksum checks at install-time?
trodrigu commented 1 year ago

Does it make sense to move def install() from lib/sqlite_vss.ex to mix/tasks/sqlite_vss.install.ex? I'd ideally like to keep sqlite_vss.ex as small as possible and only contain def loadable_path_vector0() and def loadable_path_vss0(), but I'm not sure if the rest has to be in sqlite_vss.ex

Yes, this would be a worthy refactor. It makes sense to move the install related functions into the mix task file.

trodrigu commented 1 year ago

Do you think it's worth doing the sqlite-vss-checksum.exs checksum checks at install-time?

After reading through elixir_make I think it should be easy to do this actually. Another worthy refactor.

trodrigu commented 1 year ago

The published sqlite_vss package gets installed successfully in my production app on fly.io. Great work!