Open Datseris opened 1 year ago
The encoding could in principle be generalizable to multidimensional CDF functions as well, although I'm not sure something like that exists in the literature yet in the context of these "entropy-like" quantities. The function f
is just the input to quadgk
(which only handles univariate functions at the moment).
A completely generic version of CDFEncoding
could be something like
Base.@kwdef struct CDFEncoding <: Encoding
precomputed_stuff::NamedTuple # e.g. mean and std
f::Function = exp((-(xᵢ - μ)^2)/(2σ^2)) # or something else for another CDF
lb::T # lower integration bound
up::T # upper integration bound
integrator::Function = quadgk
end
Or something along those lines, depending on the call signature of quadgk
or whatever other integrator one would use for multidimensional input.
EDIT:
Alternatively, one could drop the integrator stuff in the CDFEncoding
stuff and rather have CDFEncoding{D} <: Encoding
, where D
is the dimension of the data. Then one could dispatch separately for 1D (using quadgk
for integration), >=2D data (using some other integrator).
you don't need to have precomputed_stuff
. Simply make the function f = x -> exp((-(xᵢ - μ)^2)/(2σ^2))
by calculating or using μ, σ
. The closure already stores the numbers. But also, not sure what is the use here of hyper generalizing: higher dimensions and different integrator functions don't really fit the need of the struct. Univariate cumulate distribution functions still make sense in context though. Also no need for the integration bounds, as from -inf to x makes sense because thats by definition what gives you the probability from a CDF.
This is some generality improvements for the current
GaussianCDFEncoding
andDispersion
. In general, any CDF could be used in the source code of the encoding; one could store the CDF function in the encoding struct. E.g., give some timeseriesx
generate the function:Any other univariate function instead of
f
could be generated. This function then is stored as a field in a new structCDFEncoding
, that uses the exact source code ofGaussianCDFEncoding
but usingf
instead of the existinggaussian
function.Then, this is super easily propagated into
Dispersion
: that type should initrialize aCDFEncoding
and store the encoding directly as its field. If given only a timeseries, it defaults to getting mean, std and initializing the Gaussian encoding.