JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
56 stars 14 forks source link

Add page that estimates total number of measures in the docs #403

Closed Datseris closed 4 months ago

Datseris commented 4 months ago

The page is hidden by default. Let me know if the estimation is accurate. We get about 1,600 measures at the moment.

kahaaga commented 4 months ago

Hey, @Datseris! This looks mostly good, but we're slightly off with the counts:

n_complexity_measures_statistical_complexity = length(INFO_MEASURES_DISCRETE) * n_probs_count # remove
n_complexity_measures_statistical_complexity = n_discrete_info_est * n_probs_count # add

Do we agree?

Datseris commented 4 months ago

The code of StatisticalComplexity does not allow for estimators; only measures, as far as I can tell.

Datseris commented 4 months ago

I also do wonder whether we are pushing it with statistical complexity. Initially we agreed that keywords to various measures don't count as different measures.

kahaaga commented 4 months ago

I also do wonder whether we are pushing it with statistical complexity. Initially we agreed that keywords to various measures don't count as different measures.

Good point, I think we should be conservative. Let's treat StatisticalComplexity as a single measure, but maybe put in a cheeky comment that we could have counted its variations as many if we wanted to 😜

kahaaga commented 4 months ago

The code of StatisticalComplexity does not allow for estimators; only measures, as far as I can tell.

We've been clever enough with the implementation that we can actually use estimators 👍

julia> c = StatisticalComplexity(hest = Jackknife(definition = Tsallis(q = 2.0)))
StatisticalComplexity, with 4 fields:
 dist = Distances.JSDivergence()
 hest = Jackknife(definition = Tsallis(q = 2.0, k = 1.0, base = 2))
 pest = RelativeAmount()
 o = OrdinalPatterns{3}(encoding = OrdinalPatternEncoding(perm = [0, 0, 0], lt = isless_rand), τ = 1)

The hest takes an estimator as the input, not the measure itself. The measure is a field of the estimator. If just a measure is given, it is automatically given to the PlugIn estimator.

julia> c = StatisticalComplexity(hest = Tsallis(q = 2.0))
StatisticalComplexity, with 4 fields:
 dist = Distances.JSDivergence()
 hest = PlugIn(definition = Tsallis(q = 2.0, k = 1.0, base = 2))
 pest = RelativeAmount()
 o = OrdinalPatterns{3}(encoding = OrdinalPatternEncoding(perm = [0, 0, 0], lt = isless_rand), τ = 1)
kahaaga commented 4 months ago

Also, we don't count spatial measures as something special, because estimating something with spatial data is just a variation on the input data?

Datseris commented 4 months ago

I decided to change counting spatial data as special, because the multiplicity of how many measures we have or not depends in the end of the day only on if an outcome space is count based or not. The input data doesn't matter. In the paper they are still a separate category in the comparison table!

I'll quickly fix the docstring of statistical complexity and adjust the counting.

kahaaga commented 4 months ago

I decided to change counting spatial data as special, because the multiplicity of how many measures we have or not depends in the end of the day only on if an outcome space is count based or not. The input data doesn't matter. In the paper they are still a separate category in the comparison table!

Sounds reasonable.

I'll quickly fix the docstring of statistical complexity and adjust the counting.

👍

Datseris commented 4 months ago

Right, in the end i think a fair middle ground is to only count measure definitions and outcome spaces in statistical complexity, but ignore estimators for either. However, I may have counted wrong: does statistical complexity work with only count-based outcome spaces?

kahaaga commented 4 months ago

Right, in the end i think a fair middle ground is to only count measure definitions and outcome spaces in statistical complexity, but ignore estimators for either. However, I may have counted wrong: does statistical complexity work with only count-based outcome spaces?

Quick testing. StatisticalComplexity works with some of the non-count-based measures, but not all.


julia> complexity(StatisticalComplexity(o = AmplitudeAwareOrdinalPatterns()), rand(100))
0.007386037471343663

julia> complexity(StatisticalComplexity(o = WeightedOrdinalPatterns()), rand(100))
0.011453503465147306

For NaiveKernel, PowerSpectrum, TransferOperator and WaveletOverlap it doesn't work.

kahaaga commented 4 months ago

Right, in the end i think a fair middle ground is to only count measure definitions and outcome spaces in statistical complexity, but ignore estimators for either. However, I may have counted wrong: does statistical complexity work with only count-based outcome spaces?

I'm not sure what the best approach here. Rosso et al (2013) generalizes the statistical complexity to different information measures (defined as characterizing some probability distribution, like we do here - perhaps worth mentioning that our definition is not unique in the paper?). They do not mention anything about information measure beyond the PlugIn estimator, which they implicitly use, I assume. They also do not consider different outcome spaces.

To be fair to both them and us, I think we should either count all possibilities (outcome spaces, probabilities estimators, information measures and their estimators) or none (count StatisticalComplexity as one measure). If we only count selected measures, we're kind of shooting ourselves in the foot, because the generalization we did is actually a significant improvement to the complexity quantification toolbox.

Datseris commented 4 months ago

My point is:

if you look at the numerical values of the statistical complexity, they would be approximately the same if I changed the probabilities estimator and entropy estimator.

However, the numerical values would be drastically different if I changed the outcome space. They would also change if I changed the entropy measure, but less so. In the end, I do not know what is the best way forwards.

"To be fair to both them and us, ": we only care about being fair to other softwares, not to papers defining complexity measures, as we don't do that here.

kahaaga commented 4 months ago

if you look at the numerical values of the statistical complexity, they would be approximately the same if I changed the probabilities estimator and entropy estimator. However, the numerical values would be drastically different if I changed the outcome space. They would also change if I changed the entropy measure, but less so. In the end, I do not know what is the best way forwards.

Ok, let's just say that in the counting script. "For StatisticalComplexity, counting all possible combinations of outcome spaces, probabilities estimators, information measure definitions, information measure estimators, along with distance measures, as unique measures would over-inflate the measure count. For practicality, we here count different version of StatisticalComplexity by considering the number of statistical complexity measures resulting from counting unique outcome spaces and information measures, since these are the largest contributors to changes in the computed numerical value of the measure".

Datseris commented 4 months ago

@kahaaga shall we merge this now?