Analyses thus far have used Euclidean distance, which has worked well enough for initial eyeballing. However, it doesn't distinguish much between a small value and zero, which is important given the PUF's sparsity. One rule of thumb proposed is that Euclidean isn't useful when less than 3/4 of attributes are non-zero, which is certainly the case in the PUF.
That same thread suggested that cosine similarity can be better in these cases, though a comment here suggests it's best for categorical data. Cosine similarity should be normalized. Others like Gower and Mahalanobis distances can be investigated here.
Analyses thus far have used Euclidean distance, which has worked well enough for initial eyeballing. However, it doesn't distinguish much between a small value and zero, which is important given the PUF's sparsity. One rule of thumb proposed is that Euclidean isn't useful when less than 3/4 of attributes are non-zero, which is certainly the case in the PUF.
That same thread suggested that cosine similarity can be better in these cases, though a comment here suggests it's best for categorical data. Cosine similarity should be normalized. Others like Gower and Mahalanobis distances can be investigated here.