Closed benwtrent closed 2 months ago
Actually, this might be a bug. Looking at the code, I am not sure we normalize the vectors when building the quantizer & using cosine.
Yep, not normalizing during quantile merging is the actual bug. I have a local patch that fixes this. Need to write up some tests & will push up a change soon.
Good find @naveentatikonda!
Description
just with some default settings,
glove-200
does poorly withint4
HNSW and when usingcosine
. The bug occurs on merge. When recalculating the quantiles, the vectors aren't normalized like they should, so the quantiles will get all out of whack. We can actually see this in some of the below experiments. All these values should indicate a normalized vectory, however some of the results are > 1 or < -1.Some experiments I have done:
Dynamic confidence interval:
0.693
(I regrettably didn't get the quantiles here...)int4
vectors are used0.189
(minQuantile=-1.2686034, maxQuantile=1.3250866)0.5 confidence interval (locally patched to allow it):
int4
vectors building the graph0.649
(minQuantile=-0.277436, maxQuantile=0.29298353)0.75 confidence interval (locally patched to allow it):
int4
vectors building the graph0.535
(minQuantile=-0.48500386, maxQuantile=0.5028781)0.9 confidence interval:
int4
vectors building the graph0.407
(minQuantile=-0.71112806, maxQuantile=0.73441404)