JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
53 stars 12 forks source link

Entropy normalization for the statistical complexity #266

Closed kahaaga closed 1 year ago

kahaaga commented 1 year ago

@ikottlarz Hey,

I'm trying to use the entropy_complexity function on some images. I was unsure whether the entropy values I got out were normalized. According to the docstrings, it is supposed to compute normalized entropies. But I went to the source code and couldn't figure out where in the code the (generic) normalization happens.

There's a few lines in complexity(c::StatisticalComplexity, p::Probabilities) where a Shannon-entropy type normalization seems to happen:

L = total_outcomes(est)
    norm = log(entr.base, L)
    H_q = entropy(entr, p) / norm

However, this normalization would only be valid for the Shannon entropy, not generically for any entropy definition. The normalization for other entropy types, e.g. Tsallis, is different. I think the code for complexity should use the entropy_normalized function, right? That would ensure that the normalization is correct. Or am I missing something?

ikottlarz commented 1 year ago

Hi @kahaaga, you're totally right, this normalization only works for Shannon Entropies, so this definitely needs to be changed. I vaguely remember originally wanting to use entropy_normalized but ending up not doing it for some reason I cannot recall right now (could it be that entropy_normalized requires an x already because the alphabet length is not always predetermined?). In any case, it's at least a bug in the documentation for not marking that this only works for Shannon Entropies, but ideally we change it to the generalized version. I'm travelling right now and will reply more tomorrow

kahaaga commented 1 year ago

could it be that entropy_normalized requires an x already because the alphabet length is not always predetermined?

The docstring already says that the function should work for any probabilities estimator that has a known alphabet length, so we can just assume that the user provides a valid estimator (with known alphabet length). One of the nice things of the API is that if the alphabet length isn't known, or a normalization isn't defined, then the code will just fall back to relevant error messages.

In any case, it's at least a bug in the documentation for not marking that this only works for Shannon Entropies

It isn't documented yet that any entropy definition can be used, so no worries. I was just curious, because I wanted to test what happens with the statistical complexity when using other types of entropy.

I'm travelling right now and will reply more tomorrow

No worries, I am also on vacation, so this is not an urgent issue.

Datseris commented 1 year ago

The docstring already says that the function should work for any probabilities estimator that has a known alphabet length, so we can just assume that the user provides a valid estimator (with known alphabet length).

Recall that we made the conscious choice that only estimators with known alphabet length exist in ComplexityMeasures.jl anyways.

ikottlarz commented 1 year ago

Alright @kahaaga, I went through the code and remembered why I didn't use entropy_normalized back then: entropy_normalized is not defined for an EntropyEstimator and Probabilities

Notice that there is no method entropy_normalized(e::DiscreteEntropyEstimator, probs::Probabilities), because there is no way to know the amount of possible events (i.e., the total_outcomes) from probs.

(from the docs).

I could work around this during the estimation of some complexity(c::StatisticalComplexity, x), where x is given anyways, but not for the estimation of the entropy_complexity_curves, which are estimated purely by shuffling probability distributions around, without ever creating an x from which these distributions would be derived. Currently, complexity(c::StatisticalComplexity, x) just falls back on complexity(c::StatisticalComplexity, p::Probabilities) after estimating p from x.

I think it should be possible to create a version entropy_normalized(e::EntropyEstimator, est::ProbabilitiesEstimator, probs::Probabilities) for estimators with known alphabet length, that would automatically raise an error if total_outcomes(est) is not defined. If you guys agree I'll do a PR on this. I realize it's kind of redundant outside of this usecase, since otherwise you could always estimate probs from x with est

kahaaga commented 1 year ago

Thanks for look into this, @ikottlarz.

As I understand it, the entropy_complexity_curves is just a convenience function that is mainly used for plotting purposes, right? (i.e. outlining the possible values of H and C?) It seems to me that we should focus on making sure complexity(c::StatisticalComplexity, x) works as generically as possible, then second priority is that any convenience functions are equally powerful.

I think it should be possible to create a version entropy_normalized(e::EntropyEstimator, est::ProbabilitiesEstimator, probs::Probabilities) for estimators with known alphabet length, that would automatically raise an error if total_outcomes(est) is not defined. If you guys agree I'll do a PR on this. I realize it's kind of redundant outside of this usecase, since otherwise you could always estimate probs from x with est

If it solves the issue, I'm totally fine with that solution. The method doesn't need to be public, so it doesn't complicate anything for the user. Would defining and using this method then allow non-Shannon entropies both for complexity(c::StatisticalComplexity, x) and for entropy_complexity_curves?

Datseris commented 1 year ago

How much does the inclusion of allprobabilities help here?

kahaaga commented 1 year ago

Closed by #268