Closed kahaaga closed 5 months ago
encode(::BubbleSortCountEncoder, x::AbstractVector)
convertsx
into an integer in the range0:n(n−1)/2
(the number of swaps required to sortx
).
I don't see how this is possible. encode
should give the same encoding irrespectively of the order of the vectors in the state space set (embedded set). That's also why the encoding doesn't depend on the input data, but only one 1 element of the input data. For sorting this not the case. If the original dataset is already sorted, encode(e, x)
would always return 0 irrespectively of x
.
I think for this feature it makes sense to have the outcome space, but not the encoding. I find this more similar to the spectral entropy and the PowerSpectrum
outcome space.
encode(::BubbleSortCountEncoder, x::AbstractVector) converts x into an integer in the range 0:n(n−1)/2 (the number of swaps required to sort x). don't see how this is possible. encode should give the same encoding irrespectively of the order of the vectors in the state space set (embedded set).
encode
works strictly on single state vectors, not the entire embedding. Each vector has a certain amplitude distribution, which maps directly onto an integer in the range 0:n(n−1)/2
, because 0 is the minimum number of swaps for the bubble sort algorithm to render the state vector sorted, and 0:n(n−1)/2
is the maximum number of swaps.
Many state vectors in the same embedding can map to the same integer, because the number of sorting operations required to sort them is the same, even though the vectors are different. The order of the embedding does matter here, because the input to encode
is a state vector, not an embedding.
I've implemented my suggestion in #390. This should be clear from the source code there
Oooooooooooioh now I understand what you mean by "vector to be sorted". I thought you meant: sort the vectors wtihin the embedded set. You mean: sort the elements of each statespace point, i.e., the same as getting the sortperm
for the patterns.
OKay, all good, I'll look at the PR now.
Closed by #390
The "bubble entropy" is a method that has seen quite some usage. Framed in terms of our API, the authors do two things: 1) define a new outcome space, 2) define a
ComplexityEstimator
based on that outcome space.The method works by 1) embedding the input time series with dimension
m
, 2) for every state vectorx_i
, count the number of stepsn_i
necessary to sort a state vector in ascending order using the naive bubble sort algorithm, 3) estimate a histogram from the distribution ofn_i
s, 4) plug it into the Shannon entropy formula.This is repeat for dimension
m+1
, and the bubble entropy measure is a scaled difference between the entropies computed for dimensionsm+1
andm
. Therefore, it falls in the category of "entropy-like" measures, since it does not directly quantify an entropy. However, since it is compatible with ourOutcomeSpace
API, we should implement it as such, and define aComplexityEstimator
based on that.A
BubbleSortCount{m} <: OutcomeSpace
type can be defined, wherem
is the embedding dimension. Since, there's always a minimum0
and a maximumn(n−1)/2
number of swaps require to sort a vector of lengthn
, the outcome space is finite. We can implementencode
, but notdecode
(it isn't well-defined ... I think).encode(::BubbleSortCountEncoder, x::AbstractVector)
convertsx
into an integer in the range0:n(n−1)/2
(the number of swaps required to sortx
).decode(::BubbleSortCountEncoder, i::Integer)
does... what?HOWEVER: in the public docs, we nowhere state that
decode
must be implemented. So I'm kind of in favor of just implementing it as an encoding too, but just throwing a warning fordecode
. The method will still be useful, and can not only be used with the API here, but also iimmediately be used to define new multivariate measures in CausalityTools.jl if we implementcodify(::BubbleSortCount)
.Algorithm