DeMoriarty / fast_pytorch_kmeans

This is a pytorch implementation of k-means clustering algorithm
MIT License
284 stars 38 forks source link

Calculating euc_sim() #18

Closed juskuz closed 1 year ago

juskuz commented 1 year ago

I tested the method euc_sim() and it seems that values are not squared. Example:

a = torch.tensor([[2, 3], [3, 5], [5, 8]], dtype=torch.float32)
b = torch.tensor([[1, 0], [2, 1]], dtype=torch.float32)
>>> euc_sim(a,b)
tensor([[-10.,  -4.],
        [-29., -17.],
        [-80., -58.]])

Could anyone explain why? I suppose that sqrt doesn't change the order when looking for max value (max similarity). I compared it with torch.cdist() and there are squered values. Example (same variables a and b):

>>> torch.cdist(a,b)
tensor([[3.1623, 2.0000],
        [5.3852, 4.1231],
        [8.9443, 7.6158]])
DeMoriarty commented 1 year ago

euc_sim() computes negative squared euclidean distance. it's negative because we need the largest value to indicate most similar (closest) pair of vectors, and we're not taking sqrt of it because what matters is the order of the values, not the values themselves, so we might as well save some computation by not doing sqrt

juskuz commented 1 year ago

Thanks for clarification. I wanted to use the euc_sim formula for another computations (or write unit tests for myself) and the value of distance was a bit unclear.