Remove DimensionalData.jl dependency

JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems

MIT License

48 stars 11 forks source link

Remove DimensionalData.jl dependency #326

Closed Datseris closed 6 months ago

Datseris commented 9 months ago

In PR #316 we have added DimensionalData.jl as a dependency, in order to be able to pretty-print Probabilities like so:

The only reason we use DimensionalData.jl is for printing. For my perspective, this is unjustifiable. We have anyways have written our own printing code, because DD.jl is too complex printing. And we anyways expand every Base functionality for Probabilities manually.

I think we should have an outcomes field directly into Counts/Probabilities that is a Tuple of AbstractVectors and contains the outcome spaces. We then use this outcomes field for pretty printing and remove the DD.jl dependency.

We don't need the core of DD.jl functionality that is about accessing values by named dimensions, or averaging by named dimensions. So I can't see a way that this "heavy" dependency can be justified and for it to go into a new release of ComplexityMeasures.jl.

Datseris commented 9 months ago

Also, forgot to mention my other comments on why using DD.jl is a bad idea unless absolutely necessary:

But you should also know that I had headaches with ClimateBase.jl with DimensionalData.jl getting breaking releases all the time (version 0.25 means 24 breaking releases if you think about it). I have stopped updating ClimateBase.jl due to this headache.

Datseris commented 9 months ago

@kahaaga I've just noticed another problem.

Previously, it was possible to optimize counts or probabilities so that they do not compute outcomes. E.g., here

function counts(encoding::RectangularBinEncoding, x)
    cts, bins = fasthist(encoding, x) # bins are integers here
    unique!(bins) # `bins` is already sorted from `frequencies!`
    # Here we transform the cartesian coordinate based bins into data unit bins:
    outcomes = map(b -> decode(encoding, b), bins)
    return Counts(cts, (outcomes,))
end

we could avoid computing the outcomes before the move to dimensionaldata.jl was made. Now, it is no longer possible, and this loss of optimization will be forever a burden, and can become a large burden if you have to run 100000s of timeseries.

kahaaga commented 9 months ago

What about having the following?

function counts_and_outcomes(encoding::RectangularBinEncoding, x)
    cts, bins = fasthist(encoding, x) # bins are integers here
    unique!(bins) # `bins` is already sorted from `frequencies!`
    # Here we transform the cartesian coordinate based bins into data unit bins:
    outcomes = map(b -> decode(encoding, b), bins)
    return Counts(cts, (outcomes,))
end

function counts(encoding::RectangularBinEncoding, x)
    cts, bins = fasthist(encoding, x) # bins are integers here
    unique!(bins) # `bins` is already sorted from `frequencies!`
    return Counts(cts)
end

The first signature explicitly computes the outcomes, the second doesn't.

Datseris commented 9 months ago

What would the display of this be? wouldn't it error at display? since it doesn't have a dimension to display in the grayed out marginals...?

Datseris commented 9 months ago

Also this is a breaking change to the call signature, so anyways I would say I am against it. probs_and_outcomes returns a decomposable tuple of the vector of probabilities and the vector of outcomes.

kahaaga commented 9 months ago

What would the display of this be? wouldn't it error at display? since it doesn't have a dimension to display in the grayed out marginals...?

We can just do a quick check to see if marginals are present or not, and print them if they are present.

kahaaga commented 9 months ago

Also this is a breaking change to the call signature, so anyways I would say I am against it. probs_and_outcomes returns a decomposable tuple of the vector of probabilities and the vector of outcomes.

It is possible to do both: return a Counts that has the outcomes on the marginals, and explicitly return it. This has no extra cost, since the outcomes are already computed in counts_and_outcomes/probabilities_and_outcomes

Datseris commented 8 months ago

@kahaaga since you coded the original DimensionalData.jl code, could you please be the one that tackles this issue? It's one of the final 2 blockers for the next release of ComplexityMeasures.jl.

kahaaga commented 7 months ago

since you coded the original DimensionalData.jl code, could you please be the one that tackles this issue? It's one of the final 2 blockers for the next release of ComplexityMeasures.jl.

I'm on it. It'll probably take a few days, because it is a bit involved to get the pretty printing right.