Feature request: Similarity sensitive KL divergence

ArnaoutLab / diversity

Partitioned frequency- and similarity-sensitive diversity in Python

MIT License

6 stars 1 forks source link

Feature request: Similarity sensitive KL divergence #92

Open IosiaLectus opened 1 month ago

IosiaLectus commented 1 month ago

I think it could be usefull to find a notion of KL divergence between two metacommunities. Probably, these could come in Alpha, Gamma, or Beta flavors. The KL divergence $KL(p||q)$ is defined by

$KL(p||q) = \displaystyle\sum{x} p(x) \log \left( \frac{p(x)}{q(x)} \right) = \mathbb{E}{p} \left( \log \frac{p(x)}{q(x)} \right)$

To make it similarity aware, I would replace this with

$KL(p||q)^{Z} = \displaystyle\sum{x} p(x) \log \left( \frac{\sum{x'} Z(x,x') p(x') }{\sum_{x'} Z(x,x') q(x')} \right) $

There are similar formulas for Renyi divergences at other viewpoint parameters that can likewise be generalized

chhotii-alex commented 1 month ago

Note that log(a/0) is undefined. So if something is not in probability distribution Q then... What? Technically KL-divergence is only defined if support of P is a subset of support of Q. Log(0/a) is also undefined but by convention consider 0*log(0/a) to be zero (not represented at all in P, so, fair).

Options:

Emit warning and return NaN if this happens
Regularize-- add small epsilon to every zero (make this an option?)
also allow calculating the Jensen-Shannon divergence

chhotii-alex commented 1 month ago

What's proposed is basically;

def sim_kl(ab, sim):
    norm_ab = ab / np.sum(ab, axis=0)
    Zp = sim @ norm_ab
    ratios = Zp[:, 0] / Zp[:, 1]
    return sum(norm_ab[:, 0] * np.log(ratios))

(except, all subcommunities vs. all subcommunities... not just 1st vs. 2nd)

However, this can give negative results, whose interpretation is challenging. Try this:

ab = np.array([[1, 1], [0, 1], [1, 1]])
sim = np.array([
    [1, 0.70710678, 0], 
    [0.70710678, 1, 0.70710678],
    [0, 0.70710678, 1]])

This is not a pathological similarity matrix. Imagine that a, b, and c are unit-length positive vectors. a = [1, 0] b = [0.70710678, 0.70710678] and c = [0, 1]. The similarity is the dot product (cosine similarity). a and c are orthogonal and thus utterly different. The resulting similarity-sensitive KL divergence is significantly negative (about -0.13).

chhotii-alex commented 1 month ago

There's a relationship between cross-entropy and divergence. Reeve's paper, section S1.2.1 defines a similarity-sensitive cross entropy. Can you use that to define a similarity-sensitive divergence?

IosiaLectus commented 1 month ago

Reeve's equation 12 is what I would want here (up to taking the limit for q=1)