Open kahaaga opened 1 year ago
Seems to me like this should be a different estimator. Also, what where does "Top" and "Bottom" come from? How does this scale in 4-order permutation? In any case this exact case presented here seems low priority because it is of limited usage with experimental / noise-contaminated data.
Seems to me like this should be a different estimator.
Maybe. I'm not sure yet.
In any case this exact case presented here seems low priority because it is of limited usage with experimental / noise-contaminated data.
See my answer directly below. The reason for having this extension is precisely to work with noisy data, as I understand it.
Also, what where does "Top" and "Bottom" come from?
From my very quick read of the paper, it seems completely arbitrary. It is just a way to generate more discrete categories from an initial set of factorial(m)
permutation patterns. You can see why this is useful in the devdocs spatial example, where I compare the SpatialDispersion
to SpatialSymbolicPermutation
on an progressively more noisy image. The SpatialDispersion
estimator is much more sensitive to the noise (i.e. has a larger range), presumably because it has more categories. I assume He et al. try to achieve something similar, although it is not entirely clear from the paper (which, admittedly, I haven't dedicated too much time to understand in detail).
How does this scale in 4-order permutation?
The number of possible states increases exponentially with D
for AbstractDataset{D}
, just like the number of states increases exponentially with c
for SpatialDispersion
.
In any case this exact case presented here seems low priority
Yes, I won't prioritize this now, but this feature would be nice to have, since we aim (although not stated explicitly anywhere) to have a library of most/all complexity quantifiers from the literature.
Describe the feature you'd like to have
Probabilities for multivariate permutation entropy is now computed by doing
probabilities(SymbolicPermutation(), x::AbstractDataset{D})
. This works by simply finding the permutation patterns for eachxᵢ ∈ x
and computing the probability of each unique symbol as its relative frequency of occurrence.In He et al. (2016), they don't simply find permutation patterns of the raw data. They also subdivide the range of the data (after min-max-normalising each variable) into three domains, so that each original permutation pattern turns into multiple possible permutation patterns. See attached screenshot:
Cite scientific papers related to the feature/algorithm
If possible, sketch out an implementation strategy
He et al. (2016)'s procedure is very similar to what
Dispersion
does. ItI'm thinking that there should be a unified way of doing this. For example, if demanding that input data are always normalized to
[0, 1]
(as it already is forDispersion
), then the user could specify "equidistant binning" or "quantile-based binning" or "whatever-based binning" of the normalized range to obtain any type of subdivision for the permutation patterns. The obvious way to do so, would to useEncoding
here too.