Fixing MI estimator with KNN

brainets / hoi

Higher-Order Interactions

https://brainets.github.io/hoi/

BSD 3-Clause "New" or "Revised" License

18 stars 7 forks source link

Fixing MI estimator with KNN #38

Closed chrisferreyra13 closed 3 months ago

chrisferreyra13 commented 3 months ago

As I discussed with @EtienneCmb , I found an issue when using KNN estimator to compute MI.

The problem arises from computing MI using I(X,Y)=H(X)+H(Y)-H(X,Y) when using KNN estimator. The KNN entropy estimator is ok, but as the authors said, computing the distances in the joint and marginal spaces is not the same. The biases coming from nonuniformity of the density in each entropy are not the same, so they will not cancel out. See discussion in pag. 4, section 2.B, in the original paper Kraskov et al., 2004.

I propose to add a compute_mi_knn function for this special case. Therefore, there will be a generic MI function (the current one) and one for KNN estimator (KSG). Based on other implementations I saw, I add mine with JAX that probably could be improved.

I attach some simulations with distributions from https://github.com/cbg-ethz/bmi/tree/main. BTW, we should check those distributions to create some nice tests for HOI.

Please let me know if I add/modify something before merging. 😄 mi_1v1-additive-0 75 mi_1v1-normal-0 75

EtienneCmb commented 3 months ago

Excellent, thank you !

brovelli commented 3 months ago

Wow, well spotted Christian! Great job

EtienneCmb commented 3 months ago

I attach some simulations with distributions from https://github.com/cbg-ethz/bmi/tree/main. BTW, we should check those distributions to create some nice tests for HOI.

Good idea @chrisferreyra13 , you can propose unit tests to benchmark our estimators. Also, since they implemented estimators with Jax, we could allow HOI objects to accepts external estimators. Something like in the get_entropy function :

if method == "gcmi":
    return partial(entropy_gcmi, **kwargs)
elif method == "binning":
    return partial(entropy_bin, **kwargs)
elif method == "knn":
    return partial(entropy_knn, **kwargs)
elif method == "kernel":
    return partial(entropy_kernel, **kwargs)
elif callable(method):
    # small test that the function returns a single entropy value
    assert method(np.random.rand(...))
    return partial(method)
else:
    raise ValueError(f"Method {method} doesn't exist.")

And we could do the same with the MI. What do you think?

chrisferreyra13 commented 3 months ago

Thank you both! @EtienneCmb yes! For HOI it could be nice to benchmark multivariate dense/sparse interactions, a test that seems to be hard for estimators like KSG when sparsity increases. Add custom entropy/mi methods could be nice because neural based estimators could be of interest for HOI. We should defined input-output expected shapes, dtypes, etc. but the rest should be straight forward.

EtienneCmb commented 3 months ago

Exactly. Regarding the expected shapes and dtypes, we could follow the format of the methods already implemented in hoi. Are you interested in proposing a PR?

chrisferreyra13 commented 3 months ago

Yes, I can work on those ideas. 🚀