JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
48 stars 11 forks source link

Feature: bubble entropy (description is WIP) #384

Closed kahaaga closed 5 months ago

kahaaga commented 5 months ago

The "bubble entropy" is a method that has seen quite some usage. Framed in terms of our API, the authors do two things: 1) define a new outcome space, 2) define a ComplexityEstimator based on that outcome space.

The method works by 1) embedding the input time series with dimension m, 2) for every state vector x_i, count the number of steps n_i necessary to sort a state vector in ascending order using the naive bubble sort algorithm, 3) estimate a histogram from the distribution of n_is, 4) plug it into the Shannon entropy formula.

This is repeat for dimension m+1, and the bubble entropy measure is a scaled difference between the entropies computed for dimensions m+1 and m. Therefore, it falls in the category of "entropy-like" measures, since it does not directly quantify an entropy. However, since it is compatible with our OutcomeSpace API, we should implement it as such, and define a ComplexityEstimator based on that.

A BubbleSortCount{m} <: OutcomeSpace type can be defined, where m is the embedding dimension. Since, there's always a minimum 0 and a maximum n(n−1)/2 number of swaps require to sort a vector of length n, the outcome space is finite. We can implement encode, but not decode (it isn't well-defined ... I think).

HOWEVER: in the public docs, we nowhere state that decode must be implemented. So I'm kind of in favor of just implementing it as an encoding too, but just throwing a warning for decode. The method will still be useful, and can not only be used with the API here, but also iimmediately be used to define new multivariate measures in CausalityTools.jl if we implement codify(::BubbleSortCount).

Algorithm

Skjermbilde 2024-01-15 kl  11 42 26
Datseris commented 5 months ago

encode(::BubbleSortCountEncoder, x::AbstractVector) converts x into an integer in the range 0:n(n−1)/2 (the number of swaps required to sort x).

I don't see how this is possible. encode should give the same encoding irrespectively of the order of the vectors in the state space set (embedded set). That's also why the encoding doesn't depend on the input data, but only one 1 element of the input data. For sorting this not the case. If the original dataset is already sorted, encode(e, x) would always return 0 irrespectively of x.

I think for this feature it makes sense to have the outcome space, but not the encoding. I find this more similar to the spectral entropy and the PowerSpectrum outcome space.

kahaaga commented 5 months ago

encode(::BubbleSortCountEncoder, x::AbstractVector) converts x into an integer in the range 0:n(n−1)/2 (the number of swaps required to sort x). don't see how this is possible. encode should give the same encoding irrespectively of the order of the vectors in the state space set (embedded set).

encode works strictly on single state vectors, not the entire embedding. Each vector has a certain amplitude distribution, which maps directly onto an integer in the range 0:n(n−1)/2, because 0 is the minimum number of swaps for the bubble sort algorithm to render the state vector sorted, and 0:n(n−1)/2 is the maximum number of swaps.

Many state vectors in the same embedding can map to the same integer, because the number of sorting operations required to sort them is the same, even though the vectors are different. The order of the embedding does matter here, because the input to encode is a state vector, not an embedding.

I've implemented my suggestion in #390. This should be clear from the source code there

Datseris commented 5 months ago

Oooooooooooioh now I understand what you mean by "vector to be sorted". I thought you meant: sort the vectors wtihin the embedded set. You mean: sort the elements of each statespace point, i.e., the same as getting the sortperm for the patterns.

OKay, all good, I'll look at the PR now.

kahaaga commented 5 months ago

Closed by #390