Closed chrisferreyra13 closed 3 months ago
Excellent, thank you !
Wow, well spotted Christian! Great job
I attach some simulations with distributions from https://github.com/cbg-ethz/bmi/tree/main. BTW, we should check those distributions to create some nice tests for HOI.
Good idea @chrisferreyra13 , you can propose unit tests to benchmark our estimators. Also, since they implemented estimators with Jax, we could allow HOI objects to accepts external estimators. Something like in the get_entropy
function :
if method == "gcmi":
return partial(entropy_gcmi, **kwargs)
elif method == "binning":
return partial(entropy_bin, **kwargs)
elif method == "knn":
return partial(entropy_knn, **kwargs)
elif method == "kernel":
return partial(entropy_kernel, **kwargs)
elif callable(method):
# small test that the function returns a single entropy value
assert method(np.random.rand(...))
return partial(method)
else:
raise ValueError(f"Method {method} doesn't exist.")
And we could do the same with the MI. What do you think?
Thank you both! @EtienneCmb yes! For HOI it could be nice to benchmark multivariate dense/sparse interactions, a test that seems to be hard for estimators like KSG when sparsity increases. Add custom entropy/mi methods could be nice because neural based estimators could be of interest for HOI. We should defined input-output expected shapes, dtypes, etc. but the rest should be straight forward.
Exactly. Regarding the expected shapes and dtypes, we could follow the format of the methods already implemented in hoi. Are you interested in proposing a PR?
Yes, I can work on those ideas. 🚀
As I discussed with @EtienneCmb , I found an issue when using KNN estimator to compute MI.
The problem arises from computing MI using I(X,Y)=H(X)+H(Y)-H(X,Y) when using KNN estimator. The KNN entropy estimator is ok, but as the authors said, computing the distances in the joint and marginal spaces is not the same. The biases coming from nonuniformity of the density in each entropy are not the same, so they will not cancel out. See discussion in pag. 4, section 2.B, in the original paper Kraskov et al., 2004.
I propose to add a compute_mi_knn function for this special case. Therefore, there will be a generic MI function (the current one) and one for KNN estimator (KSG). Based on other implementations I saw, I add mine with JAX that probably could be improved.
I attach some simulations with distributions from https://github.com/cbg-ethz/bmi/tree/main. BTW, we should check those distributions to create some nice tests for HOI.
Please let me know if I add/modify something before merging. 😄