JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
49 stars 11 forks source link

Less strict subtyping in `ProbabilitiesEstimator`s #313

Closed kahaaga closed 9 months ago

kahaaga commented 10 months ago

In CausalityTools.jl, instead of estimating a vector of counts, we estimate N-dimensional arrays of counts (contingency tables).

I want to use the ProbabilitiesEstimators implemented here to convert these counts to probabilities. The workflow is typically:


n = 100
x = rand(["a", "b", "c", 2], n)
y = StateSpaceSet(rand(rng, n, 2))
z = StateSpaceSet(randn(n, 2))

encodings = PerPointEncoding(
    CategoricalEncoding(x),
    OrdinalPatternEncoding(2),
    GaussianCDFEncoding{2}(μ = 0.0, σ = 0.3; c = 3)
)
# Results in a three-dimensional `ContingencyTable` (a convenience wrapper around an `AbstractArray{N, Int}`)
c = contingency_table(encodings, x, y, z)

What I want to do then is

function probabilities(est::RelativeAmount, c::ContingencyTable)
    return Probabilities(c.cts ./ sum(c.cts))
end 
function probabilities(est::AddConstant, c::ContingencyTable)
    # some more complicated code
    return Probabilities(....)
end 

# and call this on the contingency table`` constructed in the example above.
probabilities(RelativeAmount(), c)
probabilities(AddConstant(), c)

However, since every ProbabilitiesEstimator implemented here enforces subtyping on OutcomeSpace and requires an OutcomeSpace as input, the above won't work.

We could solve this by making it possible to construct an "empty" probabilities estimator by default:

Base.@kwdef struct RelativeAmount{O} <: ProbabilitiesEstimator
    outcomemodel::O = nothing
end
Datseris commented 10 months ago

Well, this means that the probabilities estimators design is wrong. They should NOT have outcome spaces as fields in the first place. Why do they? A probability estimator is entirely agnostic to the outcome space. It only sees counts. It was a wrong decision to have them as fields. In general, we often follow too much of an object oriented approach instead of the more Julian finction based and multiple dispatch. Too often we make things fields of other things, when they should be arguments to functions instead.

In any case, here it is scientifically conceptually clear that the call signature must be

probabilities(o::OutcomeSpace, x)
probabilities(est::ProbEst, o::OutcomeSpace, x)
probabiities(est::ProbEst, counts_or_probabilities)

where the first signature calles RelavtiveAmount and then calls the second signature. The second signature is a generic implementation that does not depend on est or probs and calls either counts or probabilities depending on if o is count-based. So only the third method has speciifc impleentations.

The probabilities estimators don't have a reason to reference an outcome space.

kahaaga commented 10 months ago

Agreed. I'll make a PR implenting these changes asap.