Similarity / distance metrics for representations

shntnu commented 3 years ago

@jccaicedo Can you clarify what is the preferred similarity / distance metric for profiles (in this experiment)?

jccaicedo commented 3 years ago

We use Pearson correlation.

shntnu commented 3 years ago

We use Pearson correlation.

Got it. At long last, I've become more fond (again) of cosine similarity (which is closely related to Pearson [1]), because it is on a better footing in ML research, more intuitive [2], and will generally lead to very similar results given the way we normalize data.

I haven't tested it out thoroughly, so I don't recommend switching over yet, but please keep me posted in case you guys do test it out.

Pearson(x, y) is the same as Cosine(x_c, y_c) where x_c and y_c are x - x_mean, y - y_mean respectively
Panels 1 and 3 below have the same Pearson, but different Cosine. I think Cosine is "right" here :)

From https://rpubs.com/shantanu/pearson_cosine

Relevant paper: https://journals.sagepub.com/doi/10.1177/1087057113501390

jccaicedo commented 3 years ago

Very interesting! Will give cosine similarity a try!

broadinstitute / DeepProfilerExperiments

Similarity / distance metrics for representations #6