Subdivision of permutation patterns for multivariate permutation probabilities/entropy

kahaaga commented 1 year ago

Describe the feature you'd like to have

Probabilities for multivariate permutation entropy is now computed by doing probabilities(SymbolicPermutation(), x::AbstractDataset{D}). This works by simply finding the permutation patterns for each xᵢ ∈ x and computing the probability of each unique symbol as its relative frequency of occurrence.

In He et al. (2016), they don't simply find permutation patterns of the raw data. They also subdivide the range of the data (after min-max-normalising each variable) into three domains, so that each original permutation pattern turns into multiple possible permutation patterns. See attached screenshot:

Cite scientific papers related to the feature/algorithm

He, S., Sun, K., & Wang, H. (2016). Multivariate permutation entropy and its application for complexity analysis of chaotic systems. Physica A: Statistical Mechanics and its Applications, 461, 812-823.

If possible, sketch out an implementation strategy

He et al. (2016)'s procedure is very similar to what Dispersion does. It

transforms the input data to some specified range,
subdivides that range into discrete bins,
encodes the input data according to which bin it falls into.

I'm thinking that there should be a unified way of doing this. For example, if demanding that input data are always normalized to [0, 1] (as it already is for Dispersion), then the user could specify "equidistant binning" or "quantile-based binning" or "whatever-based binning" of the normalized range to obtain any type of subdivision for the permutation patterns. The obvious way to do so, would to use Encoding here too.

Datseris commented 1 year ago

Seems to me like this should be a different estimator. Also, what where does "Top" and "Bottom" come from? How does this scale in 4-order permutation? In any case this exact case presented here seems low priority because it is of limited usage with experimental / noise-contaminated data.

kahaaga commented 1 year ago

Seems to me like this should be a different estimator.

Maybe. I'm not sure yet.

In any case this exact case presented here seems low priority because it is of limited usage with experimental / noise-contaminated data.

See my answer directly below. The reason for having this extension is precisely to work with noisy data, as I understand it.

Also, what where does "Top" and "Bottom" come from?

From my very quick read of the paper, it seems completely arbitrary. It is just a way to generate more discrete categories from an initial set of factorial(m) permutation patterns. You can see why this is useful in the devdocs spatial example, where I compare the SpatialDispersion to SpatialSymbolicPermutation on an progressively more noisy image. The SpatialDispersion estimator is much more sensitive to the noise (i.e. has a larger range), presumably because it has more categories. I assume He et al. try to achieve something similar, although it is not entirely clear from the paper (which, admittedly, I haven't dedicated too much time to understand in detail).

How does this scale in 4-order permutation?

The number of possible states increases exponentially with D for AbstractDataset{D}, just like the number of states increases exponentially with c for SpatialDispersion.

In any case this exact case presented here seems low priority

Yes, I won't prioritize this now, but this feature would be nice to have, since we aim (although not stated explicitly anywhere) to have a library of most/all complexity quantifiers from the literature.

JuliaDynamics / ComplexityMeasures.jl