JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
48 stars 11 forks source link

Feature: "distribution entropy" #383

Closed kahaaga closed 5 months ago

kahaaga commented 5 months ago

The "distribution entropy" is a method where an input time series is first embedded. Then one computes the (Chebyshev) distance matrix between all state vectors. A normalized histogram over these distances is then computed. Finally, these normalized counts are fed into the Shannon entropy formula.

We should support this method.

However, it is not entirely clear to me how this will fit into the current API. We could easily add it as another ComplexityEstimator, but it would be nice to add it as an OutcomeSpace, for complete generality.

If framing the method in terms of an OutcomeSpace, what will the outcome space be? We'd probably need something like DistanceMatrixEncoding, which maps a pair of vectors onto a discretized distance interval.

Datseris commented 5 months ago

To make this an outcome space youd need to feed in a fixed 1D binning encoding for the distance histogram. And the matrix is not necessary. One just needs the vector of distances. Due to the the symmetry of chebishevn one only needs to itearte over:

dst = T[]
for i in 1:length(x)-1
for j in (i+1):length(x)
push!(dst, Chebyshev(x[i], x[j])
end
end

I dont' see a useful way to make this encoding based. And in fact, I am not sure if it is useful to have it as an outcoe space at all. We always try to be as general as possible but sometimes it is good to just keep things simple.

kahaaga commented 5 months ago

I dont' see a useful way to make this encoding based.

Ah! It would actually be trivial to use an encoding and an outcome space here. We map pairs of state vectors into distances, which are then mapped into the histogram bins. I think this should be completely analogous to what we do for CosineSimilarityBinning.

I may be missing something, but I think this should cover it. I'll try to sketch a PR, and if it turns out that I misunderstood something that makes it non-general, we can just implement it as a ComplexityEstimator instead.

Datseris commented 5 months ago

yeah seems fine to me.