asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 58 forks source link

Very slow for moderate number of embeddings #106

Open Nintorac opened 8 months ago

Nintorac commented 8 months ago

Here is a visual on how ingest time scales versus number of embeddings. If I log both axis' it looks approximately linear.

I also noticed that there only seems to be a single thread running for the entire duration of the ingest.

I am using embed dings with dimension 2560.

image

I am using python and have installed sqlite-vss via pip if that makes a difference

asg017 commented 6 months ago

Do you happen to have the code you used to ingest embeddings into sqlite-vss? It shouldn't take 30 mins to insert 30k vectors. I suspect there's a number fixes that could be made to make it much faster, including:

Also depends if you're using a custom factory or now, so any example code would be great!

Nintorac commented 6 months ago

i have lost the code sorry. If I remember right this was to create the index after all the data has been inserted