DeployQL / LintDB

Vector Database with support for late interaction and token level embeddings.
https://www.lintdb.com/
Apache License 2.0
40 stars 0 forks source link

Replace OpenMP and MKL/BLAS #34

Open mtbarta opened 2 weeks ago

mtbarta commented 2 weeks ago

Hiding parallelism behind OpenMP is causing a lot of performance problems. The version of OpenMP we link against depends on the compiler, which interacts with MKL.

OpenMP suffers from a lot of env vars that need to be tuned, and the defaults don't work well. At least on my server, it defaults to all threads, but only works well if we limit it to the number of cores.

Secondly, both faiss and LintDB use openMP, which means we can easily nest parallel calls.

It would make sense for LintDB to own a thread pool. This would make it more explicit how work is being performed and can be controlled more easily. We can evaluate open source thread pools to make this simpler.

Replacing MKL

This would simplify the build process between architectures. Onnxruntime provides optimized gemm functions, which is what we need. Since we depend on Onnxruntime, we can reuse that work.