0hq / tinyvector

A tiny nearest-neighbor embedding database built with SQLite and Pytorch. (In development!)
MIT License
772 stars 25 forks source link

Add GPU acceleration via Pytorch #8

Open 0hq opened 1 year ago

0hq commented 1 year ago

Let's start GPU accelerating with a Pytorch index. Dot products/cosine similarity are both nearly equivalent to a matrix multiplication, so using hardware accelerators seems to be useful here. On 32 GB of VRAM, we could fit 22 million MiniLM embeddings (384 dimensions on f32 precision) on a single GPU.

go-noah commented 1 year ago

I've been implementing and using pretty much the same ideas you're thinking of in tensorflow and java series.

Of course, I did the exact same thing with Pytorch, and the problem of finding the top k was also considered, as well as batch processing, dynamic batch processing, etc.

If you take a look at my code and agree with the direction I think the implementation should go, I'll contribute to this repository.

https://github.com/go-noah/akka-dynamic-batch-serving/blob/main/tensorflow-gpu-cosine-similarity/src/main/scala/serving/model/CosineSimilarity.scala

https://github.com/go-noah/akka-dynamic-batch-serving/blob/main/tensorflow-gpu-cosine-similarity/README.md