Closed kahaaga closed 1 year ago
Merging #232 (17efed6) into main (458933b) will increase coverage by
0.24%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #232 +/- ##
==========================================
+ Coverage 84.75% 85.00% +0.24%
==========================================
Files 47 48 +1
Lines 1141 1160 +19
==========================================
+ Hits 967 986 +19
Misses 174 174
Impacted Files | Coverage Δ | |
---|---|---|
..._estimators/nearest_neighbors/nearest_neighbors.jl | 100.00% <ø> (ø) |
|
...entropies_estimators/nearest_neighbors/GaoNaive.jl | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
isn't the corrected version just better? why do we need both?
isn't the corrected version just better? why do we need both?
The corrected version better. But I think for educational purposes, it is nice to have both. There's also potential research questions that can be addressed, e.g. "does it really matter, for conditional independence testing, that my estimator isn't asymptotically rebiased, if I'm using some sort of null hypothesis test where this biased estimator is applied everywhere; i.e. not only to my original data, but also to the surrogate ensemble"?
Perhaps the docstrings could be flipped: the main docs for GaoNaiveCorrected
, and the docs in GaoNaive
just states that this is similar, but without bias correction, and is included for educational purposes / completeness or whatever.
In principle, we don't need any estimator besides the one that performs the best. But we're building a library, so I think we should include whatever published estimators exist. And then it is up to the user to determine what is useful for them.
I think an argument can be made that the difference between the GaoNaive
and the GaoNaiveCorrected
is analogous to the difference between, say, using ValueHistogram
to compute entropy in a non-bias-corrected way (as we currently do), and applying bias correction based on the binning (which we probably will offer at some point).
We could also just offer correct_bias
as a field to GaoNaive
.
We could also just offer correct_bias as a field to GaoNaive.
Yes please, that is the best. We should be careful for counting reasons, when we count for the paper and say "we have 50 estimators", to not have someone say "well but 10 of these are the same".
@Datseris I've merge GaoNaive
and GaoNaiveCorrected
into a single estimator: Gao
. It has the keyword corrected::Bool
, which indicates whether correction should be applied or not. Tests and the documentation have been updated accordingly.
This PR follows up on #230, and introduces two more Shannon differential entropy estimators from the CausalityTools dev branch:
GaoNaive
andGaoNaiveCorrected
(Gao et al., 2005), which both are based on estimators from Singh et al. (2003).References
Gao, S., Ver Steeg, G., & Galstyan, A. (2015, February). Efficient estimation of mutual information for strongly dependent variables. In Artificial intelligence and statistics (pp. 277-286). PMLR.
Singh, H., Misra, N., Hnizdo, V., Fedorowicz, A., & Demchuk, E. (2003). Nearest neighbor estimates of entropy. American journal of mathematical and management sciences, 23(3-4), 301-321.