broadinstitute / DeepProfilerExperiments

8 stars 5 forks source link

Similarity / distance metrics for representations #6

Open shntnu opened 3 years ago

shntnu commented 3 years ago

@jccaicedo Can you clarify what is the preferred similarity / distance metric for profiles (in this experiment)?

jccaicedo commented 3 years ago

We use Pearson correlation.

shntnu commented 3 years ago

We use Pearson correlation.

Got it. At long last, I've become more fond (again) of cosine similarity (which is closely related to Pearson [1]), because it is on a better footing in ML research, more intuitive [2], and will generally lead to very similar results given the way we normalize data.

I haven't tested it out thoroughly, so I don't recommend switching over yet, but please keep me posted in case you guys do test it out.


  1. Pearson(x, y) is the same as Cosine(x_c, y_c) where x_c and y_c are x - x_mean, y - y_mean respectively
  2. Panels 1 and 3 below have the same Pearson, but different Cosine. I think Cosine is "right" here :)

From https://rpubs.com/shantanu/pearson_cosine image


Relevant paper: https://journals.sagepub.com/doi/10.1177/1087057113501390

jccaicedo commented 3 years ago

Very interesting! Will give cosine similarity a try!