Closed kahaaga closed 5 months ago
To make this an outcome space youd need to feed in a fixed 1D binning encoding for the distance histogram. And the matrix is not necessary. One just needs the vector of distances. Due to the the symmetry of chebishevn one only needs to itearte over:
dst = T[]
for i in 1:length(x)-1
for j in (i+1):length(x)
push!(dst, Chebyshev(x[i], x[j])
end
end
I dont' see a useful way to make this encoding based. And in fact, I am not sure if it is useful to have it as an outcoe space at all. We always try to be as general as possible but sometimes it is good to just keep things simple.
I dont' see a useful way to make this encoding based.
Ah! It would actually be trivial to use an encoding and an outcome space here. We map pairs of state vectors into distances, which are then mapped into the histogram bins. I think this should be completely analogous to what we do for CosineSimilarityBinning
.
DistanceDistribution <: OutcomeSpace
would be responsible for creating an iterator of combinations of points (x_i, x_j)
(not permutations, since we use symmetry of the distance measure). It would need the input data to the constructor, since we need the minimum and maximum distances to ensure bin coverage.DistancePairEncoding(encoder::RectangularBinEncoding)
, which is a fixed rectangular binning over e.g. the interval [0, 1]
(this normalization ensures comparison between different input data sets is valid).encode(::DistancePairEncoding, x::Tuple{<:AbstractVector, <:AbstractVector}})
that maps a pair (x_i, x_j)
onto its distance bin (which of course corresponds to a unique integer). decode(::DistancePairEncoding, i::Int)
, which maps the i
-th outcome onto the boundaries of the corresponding bin. I may be missing something, but I think this should cover it. I'll try to sketch a PR, and if it turns out that I misunderstood something that makes it non-general, we can just implement it as a ComplexityEstimator
instead.
yeah seems fine to me.
The "distribution entropy" is a method where an input time series is first embedded. Then one computes the (Chebyshev) distance matrix between all state vectors. A normalized histogram over these distances is then computed. Finally, these normalized counts are fed into the Shannon entropy formula.
We should support this method.
However, it is not entirely clear to me how this will fit into the current API. We could easily add it as another
ComplexityEstimator
, but it would be nice to add it as anOutcomeSpace
, for complete generality.If framing the method in terms of an
OutcomeSpace
, what will the outcome space be? We'd probably need something likeDistanceMatrixEncoding
, which maps a pair of vectors onto a discretized distance interval.