asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

load fails on ubuntu #9

Open baikal opened 1 year ago

baikal commented 1 year ago

Hi

When trying to load vss0.so, sqlite3 core-dumps (vector0 succeds):

SQLite version 3.42.0 2023-02-23 14:43:15
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load ./vector0
sqlite> .load ./vss0
Illegal instruction (core dumped)

Ubuntu 20.04.5 LTS, freshly compiled sqlite3 from trunk.

Tried to build from source, succeded for vector0 so far (after getting recent cmake ...). With vss0, stuck here (faiss built appearently in vendor/ in trial before):

~/MAKE/sqlite-vss$ make loadable
cmake -B build; make -C build
-- Could NOT find MKL (missing: MKL_LIBRARIES) 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/myself/MAKE/sqlite-vss/build
make[1]: Entering directory '/home/myself/MAKE/sqlite-vss/build'
make[2]: Entering directory '/home/myself/MAKE/sqlite-vss/build'
make[3]: Entering directory '/home/myself/MAKE/sqlite-vss/build'
make[3]: Leaving directory '/home/myself/MAKE/sqlite-vss/build'
[ 48%] Built target faiss_avx2
make[3]: Entering directory '/home/myself/MAKE/sqlite-vss/build'
make[3]: Leaving directory '/home/myself/MAKE/sqlite-vss/build'
make[3]: Entering directory '/home/myself/MAKE/sqlite-vss/build'
[ 49%] Building CXX object CMakeFiles/sqlite-vss.dir/src/extension.cpp.o
/home/myself/MAKE/sqlite-vss/src/extension.cpp:496:34: error: ‘idx_t’ is not a member of ‘faiss’
  496 |   std::vector<std::vector<faiss::idx_t>*> * insert_to_add_ids;
      |                                  ^~~~~
baikal commented 1 year ago

What is the proper way to build from source? I did copy the source from faiss-1.7.3 into vendor/faiss, same for vendor/sqlite-vector

asg017 commented 1 year ago

Hey @baikal , sorry that you had issues here: I've made a ton of build improvements in v0.0.2 (2023-04-10), would you mind giving it another shot? More specifically, the extra sqlite-vector extension was removed, so compiling yourself should be much easier.

In addition, the pre-compiled extensions were fixed, so it should work as-is without the Illegal instruction (core dumped) error

baikal commented 1 year ago

Hi @asg017 Tried the precompiled v0.0.2 as well as with v0.0.3:

SQLite version 3.42.0 2023-04-07 11:18:08
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load ./vector0.so
sqlite> .load ./vss0.so
Illegal instruction (core dumped)

Some time after my first post, I managed to compile everything then (Feb 2x). But also the self-compiled one core dumped.

Will try now freshly as of the instructions.

baikal commented 1 year ago

Building according to docs.md worked like a charm. Thank you.

But I do always get Illegal instruction (core dumped) on the .load ./vss0 part. In any combination of extensions:

with sqlite3 versions:

asg017 commented 1 year ago

Thanks for the detailed report! I'm not 100% sure what's happening here, but here are some notes:

On a Docker ubuntu 20.04 image, I'm able to run the pre-compiled binaries fine:

from ubuntu:20.04

RUN apt-get update && apt-get install -y wget sqlite3

ADD https://github.com/asg017/sqlite-vss/releases/download/v0.0.3/sqlite-vss-v0.0.3-vector0-linux-x86_64.tar.gz .
ADD https://github.com/asg017/sqlite-vss/releases/download/v0.0.3/sqlite-vss-v0.0.3-vss0-linux-x86_64.tar.gz .

RUN tar -xvzf sqlite-vss-v0.0.3-vector0-linux-x86_64.tar.gz 
RUN tar -xvzf sqlite-vss-v0.0.3-vss0-linux-x86_64.tar.gz 

RUN apt-get install -y libgomp1 libatlas-base-dev liblapack-dev

And build + run with this:

$ docker build -t vss-test .
$ docker run --rm -it vss-test sqlite3 :memory: '.load /vector0' '.load /vss0' 'select vss_version()'
v0.0.3

Some ideas to try:

$ sha256sum vector0.so 
4e632fdc9c8cf6b576896b462aed445a622e64a62e6c9caca898b5858419ff5b  vector0.so
$ sha256sum vss0.so 
bdfde741e4f0fb2766a1f1af0a77a9fb3b9a5bea82bf04c34f504f9d8f5fa91b  vss0.so
baikal commented 1 year ago

Tried with your Dockerfile (using podman instead of Docker). Everything fine up to loading vector0, sudden exit after loading vss0 (no core dump message, but no vss_version output either). Hm, that hints towards the avx2 subject, right? And yes, the libraries are installed and I always tried the combo vector0.so / vss0.so from the very same version/build.

Changed the faiss_avx2 --> faiss in CMakeLists.txt, and had to do rm -rf build and rm dist/*/* to get make loadable to do anything at all.

But that fixed it:

SQLite version 3.42.0 2023-04-07 11:18:08
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load ./vector0
sqlite> .load ./vss0
sqlite> select vss_version();
v0.0.3

Interesting, that the build succeded previously with faiss-avx2 ...

And sure enough, no avx2 in /proc/cpuinfo here.

asg017 commented 1 year ago

Thanks again for your detailed report! I'm going to see how hard it'll be to distribute two version of vss0: one with avx2 support, and one without.

kroggen commented 1 year ago

Maybe instead of having 2 versions, the code can check if the processor uses the AVX2 instruction set and then call the appropriate functions. There are some crypto libraries that do that, they detect if the processor has sha256 and use the hardware accelerated one otherwise use the code one.

How to detect if the CPU and the OS support the instruction set:

https://stackoverflow.com/questions/6121792/how-to-check-if-a-cpu-supports-the-sse3-instruction-set

asg017 commented 1 year ago

Thanks for sharing, didn't know you could do that! Unfortunately, I'm not sure if it'll work here: I think the avx2 instructions are run when SQLite dlopen's the vss0 extension, before any sqlite-vss is even ran. I think the only way to avoid avx2 is to compile Faiss with that different option, which we can't do at runtime

kroggen commented 1 year ago

Yes, I looked for AVX2 instructions on sqlite-vss and did not found. Only found the link to faiss-avx2 in the makefile

In this case the check for the presence of AVX2 instructions should be added on FAISS

I saw this discussion and it appears that the faiss binary has support for AVX2 and falls back to slower implementation if it is not present. Exactly what is desired. But I am not sure if that is the case, no time to dive deeper.