Feature: "distribution entropy"

kahaaga commented 5 months ago

The "distribution entropy" is a method where an input time series is first embedded. Then one computes the (Chebyshev) distance matrix between all state vectors. A normalized histogram over these distances is then computed. Finally, these normalized counts are fed into the Shannon entropy formula.

We should support this method.

However, it is not entirely clear to me how this will fit into the current API. We could easily add it as another ComplexityEstimator, but it would be nice to add it as an OutcomeSpace, for complete generality.

If framing the method in terms of an OutcomeSpace, what will the outcome space be? We'd probably need something like DistanceMatrixEncoding, which maps a pair of vectors onto a discretized distance interval.

Datseris commented 5 months ago

To make this an outcome space youd need to feed in a fixed 1D binning encoding for the distance histogram. And the matrix is not necessary. One just needs the vector of distances. Due to the the symmetry of chebishevn one only needs to itearte over:

dst = T[]
for i in 1:length(x)-1
for j in (i+1):length(x)
push!(dst, Chebyshev(x[i], x[j])
end
end

I dont' see a useful way to make this encoding based. And in fact, I am not sure if it is useful to have it as an outcoe space at all. We always try to be as general as possible but sometimes it is good to just keep things simple.

kahaaga commented 5 months ago

I dont' see a useful way to make this encoding based.

Ah! It would actually be trivial to use an encoding and an outcome space here. We map pairs of state vectors into distances, which are then mapped into the histogram bins. I think this should be completely analogous to what we do for CosineSimilarityBinning.

DistanceDistribution <: OutcomeSpace would be responsible for creating an iterator of combinations of points (x_i, x_j) (not permutations, since we use symmetry of the distance measure). It would need the input data to the constructor, since we need the minimum and maximum distances to ensure bin coverage.
Define DistancePairEncoding(encoder::RectangularBinEncoding), which is a fixed rectangular binning over e.g. the interval [0, 1] (this normalization ensures comparison between different input data sets is valid).
Define encode(::DistancePairEncoding, x::Tuple{<:AbstractVector, <:AbstractVector}}) that maps a pair (x_i, x_j) onto its distance bin (which of course corresponds to a unique integer).
Define decode(::DistancePairEncoding, i::Int), which maps the i-th outcome onto the boundaries of the corresponding bin.

I may be missing something, but I think this should cover it. I'll try to sketch a PR, and if it turns out that I misunderstood something that makes it non-general, we can just implement it as a ComplexityEstimator instead.

Datseris commented 5 months ago

yeah seems fine to me.

JuliaDynamics / ComplexityMeasures.jl

Feature: "distribution entropy" #383