baidu / puck

Puck is a high-performance ANN search engine
Apache License 2.0
320 stars 36 forks source link

Why normlize feature vector before searching in tinker? #29

Open xiaozxiong opened 2 months ago

xiaozxiong commented 2 months ago

I found that there is a normlization operation before searching in tinker.

const float* feature = normalization(context.get(), request->feature);

What is the purpose of this operation? And when I used the default parameter whether_norm=true, I got a recall@100 of almost zero. After I changed it to whether_norm=false, the recall@100 was correct. Could you offer me some possible explanations?

Thank you!

nk2014yj commented 2 months ago

The default distance calculation method is cosine similarity. The returned distance value is obtained by applying the transformation '2 - 2 * cosine similarity' to the cosine similarity value between two vectors.

For other distance calculation methods, it is necessary to update the value of whether_norm to false.

xiaozxiong commented 2 months ago

Thank you for your reply.