Closed Datseris closed 1 year ago
At first glance/though, it seems reasonable to require encodings to return integers and decodings to return elements of the outcome space.
That way, the only tricky part for the various estimators is to decide precisely what the outcome space is/should be, which is partly a matter of taste.
Should we quickly decide what encoding
should return when invalid data points are given to it? E.g., when a data point outside the fixed histogram bounds, or a data point being 5 dimensional for a 4-order ordinal patterns.
I am not sure how to proceed, but erroring (what happens currently in my branch of the bin encoding) is not useful, at least not for the binnings. Points falling "outside" the histogram should simply not affect the probabilities, but now they just stop operation as they error.
I'm also not sure what the best way to proceed in general.
If we want to maintain the relation encode(x[i], ...)
-> Int
for an input dataset x
, then we need a systematic way of handling it. But I'm not sure that's the best way to deal with it.
For ordinal patterns, I guess it would make sense to just throw an OutOfBoundsError
or something like that, because it is not possible to encode a D1
-dimensional vector to a D2
-dimensional ordinal pattern if D1 != D2
.
Perhaps throwing an error should be the default behaviour, but we allow exceptions, e.g. for the histograms. In that case, encoding using a RectangularBinEncoder
simply returns encodings::Vector{Int}
where length(encodings) <= length(x)
, i.e. points outside the binnign are simply discarded. I think this should be okay if it is documented.
Alright, let's think here how we want to establish the encodings. From my view, encodings are an intermediate interface used by probabilities estimators. Here is what I propose:
An
Encoding
encodes elements into integers exclusively. If they are not integers, I actually don't see the need for an intermediate encoding. The production interface is:encoding(x, est::ProbabilitiesEstimator) - e::AbstractEncoding
produces a type that can encode elements ofx
as integers. Not all probabilties estimators have encodings.encode(element, e::Encoding) -> i::Int
encodes the element into integerdecode(i::Int, e::Encoding) -> ω
decodes the encoding into an outcome from the outcome space of the probabilities estimator used to create the encoding.For Binnings, I have already this encoding becuase I actually use it in another project. It uses
CartesianIndices
andLinearIndices
go back and forth from the encoding and the decoding.