donboyd5 / synpuf

Synthetic PUF
MIT License
4 stars 3 forks source link

Review suitable distance metrics #32

Open MaxGhenis opened 5 years ago

MaxGhenis commented 5 years ago

Analyses thus far have used Euclidean distance, which has worked well enough for initial eyeballing. However, it doesn't distinguish much between a small value and zero, which is important given the PUF's sparsity. One rule of thumb proposed is that Euclidean isn't useful when less than 3/4 of attributes are non-zero, which is certainly the case in the PUF.

That same thread suggested that cosine similarity can be better in these cases, though a comment here suggests it's best for categorical data. Cosine similarity should be normalized. Others like Gower and Mahalanobis distances can be investigated here.