struct space中相似度的计算代码和论文不一致

Y1YU commented 2 years ago

论文中相似度的计算是Jaccard距离和cosine相似度的加权和。

但是代码实现好像不一样。如果我没搞错的话这部分是在tool/struct_space.py中实现的，如下

sim = 1.0 * len(query_Rstarset & doc_Rstarset) / len(query_Rstarset | doc_Rstarset)
jd = 1 - sim
cd = D[query_nodeid, idx]
nd = (1-lamb) * jd + lamb * cd

其中D是在tool/faiss_search.py中使用faiss计算得到的，但是这里的D是保存的欧式距离？代码如下，这里是使用欧式距离进行索引查找的。而且我打印出来D的值也是由小到大的。

quantizer = faiss.IndexFlatL2(dim)      # 定义量化器/索引为l2距离(欧式距离)，越小越好
cpu_index = faiss.IndexIVFFlat(quantizer, dim, nlist)
cpu_index.nprobe = nprobe

请问是这样的吗，还是我有什么遗漏之处？

Thomas-wyh commented 2 years ago

Thank you for your attention to Ada-NETS.

We are using the cosine distance. Although the faiss returns a Euclidean distance, we have converted it to a cosine distance in here.

You can deduce a formula that when the feature vectors are L2 normalized, the Euclidean distance between two features is twice the corresponding cosine distance.

We use similarity in the Eq.2 to describe the method for convenience, but essentially it is the same as distance (d=1-s).

Let me know if there are any further questions.

Y1YU commented 2 years ago

ok，thanks for your answer.

Zhubisong commented 1 year ago

为什么要jd=1-sim，不应该'sim'越大，两者越相似吗？

damo-cv / Ada-NETS

struct space中相似度的计算代码和论文不一致 #11