asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

Tracking: Go Bindings #49

Open asg017 opened 1 year ago

asg017 commented 1 year ago

Do you have any issues using the new Go bindings for sqlite-vss? Comment on this issue with any bugs or crashes you come across, or with suggestions on how to make it better.

TODOs for the Go bindings:

bkono commented 1 year ago

Can you provide some details on what you want to see with the cybertron example? I have multiple Go apps using the extension, with cybertron. I can likely do a direct extract from one of them and convert to an example.

So far, I've been embedding the extensions based on build flags and copying them out to os-specific app config directories to reference, but happy to try out replacing all of that with the new package.

asg017 commented 1 year ago

@bkono certainly, happy to hear about your projects!

The "cybertron example" was referencing examples/go-cybertron/demo-cybertron.go, which is pretty bare-bones right now.

The new "official" bindings are defined in bindings/go, docs are here. Try it out with:

go get -u github.com/asg017/sqlite-vss/bindings/go

These bindings statically link sqlite-vss .a files into the Go binary, so you don't have to deal with separate loadable extension files. The downside is you have to provide your own .a files and point a -L flag to a directory containing those files. The recent release have pre-compiled -static- assets for different platforms, but the compiler flags can get tricky. The examples/go example seems to work on the machines I've tried, but any help trying these bindings + examples would be great!

As far as the cyberton example, I'd love to flesh it out more. It only embeds specific words so it's not that fun. I think it would be worth updating it so that it embeds headlines/descriptions from the News Category Dataset and stores them in a vss0 table, and queries those embeddings. All in Go, using the new bindings, probably with some REPL that takes a user's query, embeds it, and find the 10 nearest news headlines.

Let me know what you think! The definite top priority rn is to get people to test out the Go bindings and file bugs, then probably a cybertron revamp after

bkono commented 1 year ago

Realized I needed a place to experiment with the new bindings before being able to cleanly bring them into my existing projects (cross-compilation questions, some unwinding to do, etc). Ended up making this - bkono/vss-example: vss-example. I think it’s similar to what you described.

After playing with the CGO flags to get arm64 (m1) working I decided the CGO build comment was less convenient than pushing it to a Makefile where I could reference $HOMEBREW_PREFIX. Downloading the static files was a little obnoxious because your current releases aren’t using standard uname based OS/ARCH combinations. You can see some of the tap-dancing I did to handle that here. Based on that annoyance, one request: Can you use the standard naming conventions that align with uname -s and uname -m for the release assets? Ideally, I could just toss a pre-step in my makefile that will grab the GitHub release, and snag the -static-(uname -s)-(uname -m).tar.gz to unpack into a local directory prior to building. This would let me find -static-darwin-arm64.tar.gz and -static-darwin-x86_64.tar.gz.

Other than that release naming piece, it was pretty smooth. I decided to use uptrace/bun: SQL-first Golang ORM rather than straight db/sql since I tend to like it for more complex queries in my projects, and there is already a stdlib example anyways.

Let me know if you have any suggestions you’d like to see added to the example.