Closed Datseris closed 4 months ago
Hey, @Datseris! This looks mostly good, but we're slightly off with the counts:
StatisticalComplexity
measure, there's one multiplier missing. The number of measure estimators. Currently, only the number of ways of estimating probabilities and the number of basic measures are included (i.e. estimated with PlugIn
). We can simply don_complexity_measures_statistical_complexity = length(INFO_MEASURES_DISCRETE) * n_probs_count # remove
n_complexity_measures_statistical_complexity = n_discrete_info_est * n_probs_count # add
Do we agree?
The code of StatisticalComplexity
does not allow for estimators; only measures, as far as I can tell.
I also do wonder whether we are pushing it with statistical complexity. Initially we agreed that keywords to various measures don't count as different measures.
I also do wonder whether we are pushing it with statistical complexity. Initially we agreed that keywords to various measures don't count as different measures.
Good point, I think we should be conservative. Let's treat StatisticalComplexity
as a single measure, but maybe put in a cheeky comment that we could have counted its variations as many if we wanted to 😜
The code of
StatisticalComplexity
does not allow for estimators; only measures, as far as I can tell.
We've been clever enough with the implementation that we can actually use estimators 👍
julia> c = StatisticalComplexity(hest = Jackknife(definition = Tsallis(q = 2.0)))
StatisticalComplexity, with 4 fields:
dist = Distances.JSDivergence()
hest = Jackknife(definition = Tsallis(q = 2.0, k = 1.0, base = 2))
pest = RelativeAmount()
o = OrdinalPatterns{3}(encoding = OrdinalPatternEncoding(perm = [0, 0, 0], lt = isless_rand), τ = 1)
The hest
takes an estimator as the input, not the measure itself. The measure is a field of the estimator. If just a measure is given, it is automatically given to the PlugIn
estimator.
julia> c = StatisticalComplexity(hest = Tsallis(q = 2.0))
StatisticalComplexity, with 4 fields:
dist = Distances.JSDivergence()
hest = PlugIn(definition = Tsallis(q = 2.0, k = 1.0, base = 2))
pest = RelativeAmount()
o = OrdinalPatterns{3}(encoding = OrdinalPatternEncoding(perm = [0, 0, 0], lt = isless_rand), τ = 1)
Also, we don't count spatial measures as something special, because estimating something with spatial data is just a variation on the input data?
I decided to change counting spatial data as special, because the multiplicity of how many measures we have or not depends in the end of the day only on if an outcome space is count based or not. The input data doesn't matter. In the paper they are still a separate category in the comparison table!
I'll quickly fix the docstring of statistical complexity and adjust the counting.
I decided to change counting spatial data as special, because the multiplicity of how many measures we have or not depends in the end of the day only on if an outcome space is count based or not. The input data doesn't matter. In the paper they are still a separate category in the comparison table!
Sounds reasonable.
I'll quickly fix the docstring of statistical complexity and adjust the counting.
👍
Right, in the end i think a fair middle ground is to only count measure definitions and outcome spaces in statistical complexity, but ignore estimators for either. However, I may have counted wrong: does statistical complexity work with only count-based outcome spaces?
Right, in the end i think a fair middle ground is to only count measure definitions and outcome spaces in statistical complexity, but ignore estimators for either. However, I may have counted wrong: does statistical complexity work with only count-based outcome spaces?
Quick testing. StatisticalComplexity
works with some of the non-count-based measures, but not all.
julia> complexity(StatisticalComplexity(o = AmplitudeAwareOrdinalPatterns()), rand(100))
0.007386037471343663
julia> complexity(StatisticalComplexity(o = WeightedOrdinalPatterns()), rand(100))
0.011453503465147306
For NaiveKernel
, PowerSpectrum
, TransferOperator
and WaveletOverlap
it doesn't work.
Right, in the end i think a fair middle ground is to only count measure definitions and outcome spaces in statistical complexity, but ignore estimators for either. However, I may have counted wrong: does statistical complexity work with only count-based outcome spaces?
I'm not sure what the best approach here. Rosso et al (2013) generalizes the statistical complexity to different information measures (defined as characterizing some probability distribution, like we do here - perhaps worth mentioning that our definition is not unique in the paper?). They do not mention anything about information measure beyond the PlugIn
estimator, which they implicitly use, I assume. They also do not consider different outcome spaces.
To be fair to both them and us, I think we should either count all possibilities (outcome spaces, probabilities estimators, information measures and their estimators) or none (count StatisticalComplexity
as one measure). If we only count selected measures, we're kind of shooting ourselves in the foot, because the generalization we did is actually a significant improvement to the complexity quantification toolbox.
My point is:
if you look at the numerical values of the statistical complexity, they would be approximately the same if I changed the probabilities estimator and entropy estimator.
However, the numerical values would be drastically different if I changed the outcome space. They would also change if I changed the entropy measure, but less so. In the end, I do not know what is the best way forwards.
"To be fair to both them and us, ": we only care about being fair to other softwares, not to papers defining complexity measures, as we don't do that here.
if you look at the numerical values of the statistical complexity, they would be approximately the same if I changed the probabilities estimator and entropy estimator. However, the numerical values would be drastically different if I changed the outcome space. They would also change if I changed the entropy measure, but less so. In the end, I do not know what is the best way forwards.
Ok, let's just say that in the counting script. "For StatisticalComplexity
, counting all possible combinations of outcome spaces, probabilities estimators, information measure definitions, information measure estimators, along with distance measures, as unique measures would over-inflate the measure count. For practicality, we here count different version of StatisticalComplexity
by considering the number of statistical complexity measures resulting from counting unique outcome spaces and information measures, since these are the largest contributors to changes in the computed numerical value of the measure".
@kahaaga shall we merge this now?
The page is hidden by default. Let me know if the estimation is accurate. We get about 1,600 measures at the moment.