Closed kahaaga closed 1 year ago
Hi @kahaaga, you're totally right, this normalization only works for Shannon Entropies, so this definitely needs to be changed. I vaguely remember originally wanting to use entropy_normalized
but ending up not doing it for some reason I cannot recall right now (could it be that entropy_normalized
requires an x
already because the alphabet length is not always predetermined?). In any case, it's at least a bug in the documentation for not marking that this only works for Shannon Entropies, but ideally we change it to the generalized version. I'm travelling right now and will reply more tomorrow
could it be that
entropy_normalized
requires anx
already because the alphabet length is not always predetermined?
The docstring already says that the function should work for any probabilities estimator that has a known alphabet length, so we can just assume that the user provides a valid estimator (with known alphabet length). One of the nice things of the API is that if the alphabet length isn't known, or a normalization isn't defined, then the code will just fall back to relevant error messages.
In any case, it's at least a bug in the documentation for not marking that this only works for Shannon Entropies
It isn't documented yet that any entropy definition can be used, so no worries. I was just curious, because I wanted to test what happens with the statistical complexity when using other types of entropy.
I'm travelling right now and will reply more tomorrow
No worries, I am also on vacation, so this is not an urgent issue.
The docstring already says that the function should work for any probabilities estimator that has a known alphabet length, so we can just assume that the user provides a valid estimator (with known alphabet length).
Recall that we made the conscious choice that only estimators with known alphabet length exist in ComplexityMeasures.jl anyways.
Alright @kahaaga, I went through the code and remembered why I didn't use entropy_normalized
back then: entropy_normalized
is not defined for an EntropyEstimator
and Probabilities
Notice that there is no method
entropy_normalized(e::DiscreteEntropyEstimator, probs::Probabilities)
, because there is no way to know the amount of possible events (i.e., thetotal_outcomes
) fromprobs
.
(from the docs).
I could work around this during the estimation of some complexity(c::StatisticalComplexity, x)
, where x
is given anyways, but not for the estimation of the entropy_complexity_curves
, which are estimated purely by shuffling probability distributions around, without ever creating an x
from which these distributions would be derived. Currently, complexity(c::StatisticalComplexity, x)
just falls back on complexity(c::StatisticalComplexity, p::Probabilities)
after estimating p
from x
.
I think it should be possible to create a version entropy_normalized(e::EntropyEstimator, est::ProbabilitiesEstimator, probs::Probabilities)
for estimators with known alphabet length, that would automatically raise an error if total_outcomes(est)
is not defined. If you guys agree I'll do a PR on this. I realize it's kind of redundant outside of this usecase, since otherwise you could always estimate probs
from x
with est
Thanks for look into this, @ikottlarz.
As I understand it, the entropy_complexity_curves
is just a convenience function that is mainly used for plotting purposes, right? (i.e. outlining the possible values of H and C?) It seems to me that we should focus on making sure complexity(c::StatisticalComplexity, x)
works as generically as possible, then second priority is that any convenience functions are equally powerful.
I think it should be possible to create a version
entropy_normalized(e::EntropyEstimator, est::ProbabilitiesEstimator, probs::Probabilities)
for estimators with known alphabet length, that would automatically raise an error iftotal_outcomes(est)
is not defined. If you guys agree I'll do a PR on this. I realize it's kind of redundant outside of this usecase, since otherwise you could always estimate probs fromx
withest
If it solves the issue, I'm totally fine with that solution. The method doesn't need to be public, so it doesn't complicate anything for the user. Would defining and using this method then allow non-Shannon entropies both for complexity(c::StatisticalComplexity, x)
and for entropy_complexity_curves
?
How much does the inclusion of allprobabilities
help here?
Closed by #268
@ikottlarz Hey,
I'm trying to use the
entropy_complexity
function on some images. I was unsure whether the entropy values I got out were normalized. According to the docstrings, it is supposed to compute normalized entropies. But I went to the source code and couldn't figure out where in the code the (generic) normalization happens.There's a few lines in
complexity(c::StatisticalComplexity, p::Probabilities)
where a Shannon-entropy type normalization seems to happen:However, this normalization would only be valid for the Shannon entropy, not generically for any entropy definition. The normalization for other entropy types, e.g.
Tsallis
, is different. I think the code forcomplexity
should use theentropy_normalized
function, right? That would ensure that the normalization is correct. Or am I missing something?