JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
49 stars 11 forks source link

WIP: discrete estimators #279

Closed kahaaga closed 11 months ago

kahaaga commented 12 months ago

Fixes #237. WIP. No need to review yet - changes will be made. The docs are here.

Temporarily introduced frequencies and frequencies_and_outcomes instead of editing the Probabilities struct, just to be able to implement the actual estimators. I will go for the agreed-upon interface after I'm done with all the estimators.

Shannon entropy estimators

codecov[bot] commented 11 months ago

Codecov Report

Merging #279 (009470c) into main (68fc7fa) will increase coverage by 0.46%. The diff coverage is 92.45%.

@@            Coverage Diff             @@
##             main     #279      +/-   ##
==========================================
+ Coverage   85.88%   86.34%   +0.46%     
==========================================
  Files          57       64       +7     
  Lines        1438     1567     +129     
==========================================
+ Hits         1235     1353     +118     
- Misses        203      214      +11     
Files Changed Coverage Δ
src/ComplexityMeasures.jl 100.00% <ø> (ø)
src/core/information_measures.jl 92.59% <0.00%> (-0.75%) :arrow_down:
src/core/probabilities.jl 87.03% <20.00%> (-6.97%) :arrow_down:
src/discrete_info_estimators/schurmann.jl 85.00% <85.00%> (ø)
src/discrete_info_estimators/chao_shen.jl 88.23% <88.23%> (ø)
.../discrete_info_estimators/schurmann_generalized.jl 92.30% <92.30%> (ø)
src/core/information_functions.jl 100.00% <100.00%> (ø)
src/discrete_info_estimators/horvitz_thompson.jl 100.00% <100.00%> (ø)
src/discrete_info_estimators/jackknife.jl 100.00% <100.00%> (ø)
src/discrete_info_estimators/miller_madow.jl 100.00% <100.00%> (ø)
... and 9 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Datseris commented 11 months ago

So should I review this or is it still WIP? Can you resolve the git conflicts?

kahaaga commented 11 months ago

So should I review this or is it still WIP? Can you resolve the git conflicts?

It's still WIP. There's some nuance when it comes to a few of the estimators regarding counting frequencies, because the estimators are purposed for small sample sizes and explicitly require actual counts (the estimators use corrections based on counting singletons, doubletons, and so on, in the data). Therefore, we can't naively convert the probabilities for any estimator to some integer, because the estimators are sample-size dependent. By introducing some arbitrary conversion factor when transforming probs -> freqs to get integers, this is ignored. Therefore, these estimators only work for probabilities obtained through actual counts (histograms, symbol frequencies), but not for probabilities obtained through normalization (e.g. wavelet or power spectrum) or some other method (transfer operator).

I'm working on #280 concurrently, which I believe will provide some deeper insight into how to solve this, and what functionality we actually need (and what to ignore).

kahaaga commented 11 months ago

Superseded by #285