Closed Datseris closed 6 months ago
Also, forgot to mention my other comments on why using DD.jl is a bad idea unless absolutely necessary:
But you should also know that I had headaches with ClimateBase.jl with DimensionalData.jl getting breaking releases all the time (version 0.25 means 24 breaking releases if you think about it). I have stopped updating ClimateBase.jl due to this headache.
@kahaaga I've just noticed another problem.
Previously, it was possible to optimize counts
or probabilities
so that they do not compute outcomes. E.g., here
function counts(encoding::RectangularBinEncoding, x)
cts, bins = fasthist(encoding, x) # bins are integers here
unique!(bins) # `bins` is already sorted from `frequencies!`
# Here we transform the cartesian coordinate based bins into data unit bins:
outcomes = map(b -> decode(encoding, b), bins)
return Counts(cts, (outcomes,))
end
we could avoid computing the outcomes before the move to dimensionaldata.jl was made. Now, it is no longer possible, and this loss of optimization will be forever a burden, and can become a large burden if you have to run 100000s of timeseries.
What about having the following?
function counts_and_outcomes(encoding::RectangularBinEncoding, x)
cts, bins = fasthist(encoding, x) # bins are integers here
unique!(bins) # `bins` is already sorted from `frequencies!`
# Here we transform the cartesian coordinate based bins into data unit bins:
outcomes = map(b -> decode(encoding, b), bins)
return Counts(cts, (outcomes,))
end
function counts(encoding::RectangularBinEncoding, x)
cts, bins = fasthist(encoding, x) # bins are integers here
unique!(bins) # `bins` is already sorted from `frequencies!`
return Counts(cts)
end
The first signature explicitly computes the outcomes, the second doesn't.
What would the display of this be? wouldn't it error at display? since it doesn't have a dimension to display in the grayed out marginals...?
Also this is a breaking change to the call signature, so anyways I would say I am against it. probs_and_outcomes
returns a decomposable tuple of the vector of probabilities and the vector of outcomes.
What would the display of this be? wouldn't it error at display? since it doesn't have a dimension to display in the grayed out marginals...?
We can just do a quick check to see if marginals are present or not, and print them if they are present.
Also this is a breaking change to the call signature, so anyways I would say I am against it. probs_and_outcomes returns a decomposable tuple of the vector of probabilities and the vector of outcomes.
It is possible to do both: return a Counts
that has the outcomes on the marginals, and explicitly return it. This has no extra cost, since the outcomes are already computed in counts_and_outcomes
/probabilities_and_outcomes
@kahaaga since you coded the original DimensionalData.jl code, could you please be the one that tackles this issue? It's one of the final 2 blockers for the next release of ComplexityMeasures.jl.
since you coded the original DimensionalData.jl code, could you please be the one that tackles this issue? It's one of the final 2 blockers for the next release of ComplexityMeasures.jl.
I'm on it. It'll probably take a few days, because it is a bit involved to get the pretty printing right.
In PR #316 we have added DimensionalData.jl as a dependency, in order to be able to pretty-print
Probabilities
like so:The only reason we use DimensionalData.jl is for printing. For my perspective, this is unjustifiable. We have anyways have written our own printing code, because DD.jl is too complex printing. And we anyways expand every Base functionality for
Probabilities
manually.I think we should have an
outcomes
field directly intoCounts/Probabilities
that is a Tuple of AbstractVectors and contains the outcome spaces. We then use thisoutcomes
field for pretty printing and remove the DD.jl dependency.We don't need the core of DD.jl functionality that is about accessing values by named dimensions, or averaging by named dimensions. So I can't see a way that this "heavy" dependency can be justified and for it to go into a new release of ComplexityMeasures.jl.