asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

Is it possible to use IndexFlatIP? #74

Closed dleviminzi closed 12 months ago

dleviminzi commented 12 months ago

Apologies if this is an obvious question as I'm new to this project and FAISS as of today.

I'm trying to index by cosine similarity. I think that can be achieved using L2norm for pre-processing and then using IndexFlatIP. The issue is that it doesn't seem possible to use IndexFlatIP in the factory string. The only option is Flat and it seems to default to IndexFlatL2. Is there a way around this that I'm missing? If not, with some light guidance, I'd be happy to do whahte

asg017 commented 12 months ago

I think #6 will unblock you here - It doesn't look like you can pass in IndexFlatIP as a factory string, but with #6 you'll be able to specify metric_type=INNER_PRODUCT (where it currently defaults to L2 distance).

I'll be releasing v0.1.1 this week with unrelated changes, but I'll see if I can get #6 in next week. Ideally, you'll be able to do:

create virtual table vss_cosine_similarity using vss0(
  your_embeddings(100) factory="L2norm,IndexFlat,IDMap2" metric_type=INNER_PRODUCT
);
dleviminzi commented 12 months ago

I think #6 will unblock you here - It doesn't look like you can pass in IndexFlatIP as a factory string, but with #6 you'll be able to specify metric_type=INNER_PRODUCT (where it currently defaults to L2 distance).

I'll be releasing v0.1.1 this week with unrelated changes, but I'll see if I can get #6 in next week. Ideally, you'll be able to do:

create virtual table vss_cosine_similarity using vss0(
  your_embeddings(100) factory="L2norm,IndexFlat,IDMap2" metric_type=INNER_PRODUCT
);

Sorry, I missed that issue! Will close this one as that will do the trick.

dleviminzi commented 11 months ago

I think #6 will unblock you here - It doesn't look like you can pass in IndexFlatIP as a factory string, but with #6 you'll be able to specify metric_type=INNER_PRODUCT (where it currently defaults to L2 distance). I'll be releasing v0.1.1 this week with unrelated changes, but I'll see if I can get #6 in next week. Ideally, you'll be able to do:

create virtual table vss_cosine_similarity using vss0(
  your_embeddings(100) factory="L2norm,IndexFlat,IDMap2" metric_type=INNER_PRODUCT
);

Sorry, I missed that issue! Will close this one as that will do the trick.

I decided to try implementing it and I've submitted a pr #75