facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
29.54k stars 3.49k forks source link

jaccard metric prob #3521

Open handsomeZhuang opened 1 month ago

handsomeZhuang commented 1 month ago

Hi, Dear Development Team, We have recently used “faiss.index_factory(dim,'Flat', faiss.METRIC_Jaccard)" and index.search() to create index and query, then found the result is not precise. We also found that the implementation of faiss source is different from that of scipy lib,but scipy lib is the same with original Jaccard method. We look forward to your reply! Best wishes~

mdouze commented 1 month ago

Thanks for the report, could you give a reproduction example ?

handsomeZhuang commented 4 weeks ago

Thanks for the report, could you give a reproduction example ?

yes,for example: “a = np.array([[70883900,42568368,16938844,55760336, 21177010,83098300,46080616,13810740,63454444,20485222], [20347602, 27256056, 23762382, 61982300, 37474148, 5487983, 7732985, 15258728, 68216584,16599308], ]).astype(np.float32) b = np.array([[20635302, 42568368, 16938844,55760336, 65016728, 830983, 46080616, 13810740, 63454444, 2048522.]]).astype(np.float32) index = faiss.index_factory(a.shape[1],'Flat', faiss.METRIC_Jaccard) index.train(a) index.add(a) dist, id = index.search(b, 1) ‘’ this anwser is a[0,:] instead of a[1,:], but the return is a[1,:] , we look forward to your reply~

handsomeZhuang commented 4 weeks ago

Thanks for the report, could you give a reproduction example ? Hi, Is the question confirmed?

handsomeZhuang commented 4 weeks ago

hi,I have sent an example to you in GitHub issue, please check it~

------------------ Original ------------------ From: Matthijs Douze @.> Date: Tue,Jun 18,2024 0:26 AM To: facebookresearch/faiss @.> Cc: handsomeZhuang @.>, Author @.> Subject: Re: [facebookresearch/faiss] jaccard metric prob (Issue #3521)

Thanks for the report, could you give a reproduction example ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

handsomeZhuang commented 3 weeks ago

Thanks for the report, could you give a reproduction example ?

hi, dear, is this prob confirmed? we look forwar to you reply and wait for this lib to finish the future work, we would approciate it if you could confirm this prob! thank you ~

mdouze commented 3 weeks ago

Do you have a reference implementation of the Jaccard metric to compare with?

handsomeZhuang commented 3 weeks ago

Do you have a reference implementation of the Jaccard metric to compare with?

yes, we use the scipy.distance lib to test it. max_id = -1 max_score = -1 for i in range(a.shape[0]): diff = np.bitwise_and((a[i,:] != b), np.bitwise_or(a[i,:] != 0, b != 0)).sum() temp = b.shape[1] - diff union = np.double(np.bitwise_or(a[i,:] != 0, b != 0).sum()) score = float(temp / union) if max_score < score: max_score = score max_id = i print(max_id,max_score)

handsomeZhuang commented 1 week ago

Do you have a reference implementation of the Jaccard metric to compare with?

hi,please aske whether it has been debuged or not?