JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
56 stars 14 forks source link

`decode` for `GaussianCDFEncoding` and `RectangularBinEncoding` return inconsistent values #300

Closed kahaaga closed 1 year ago

kahaaga commented 1 year ago

While working on #299, I realized that there's an inconsistency in what is returned from decode.

When decode-ing an integer symbol s, both GaussianCDFEncoding and RectangularBinEncoding return a representation some subinterval of some interval [a, b]. With RectangularBinEncoding (also true for both of the new encodings in #299), the leftmost edge of the bin represents the entire bin. With GaussianCDFEncoding, however, both edges of the bin is returned. I think we should be consistent and return only the leftmost edge, or both bin edges.

Any of the two alternatives above would be a breaking change, since the return value changes. What do you think, @Datseris?

We can also just keep it as is, but this became relevant because I wanted to use RectangularBinEncoding internally for GaussianCDFEncoding and dispatch to decode(::RectangularBinEncoding, s::Int) instead of doing the decoding step manually. This is also what happens for the two new encodings, so it is a bit weird that GaussianCDFEncoding is doing it manually.

Datseris commented 1 year ago

Returning only the leftmost edge seems to me the way to go. It would be more expensive to return both edges.

Datseris commented 1 year ago

We don't have to consider this breaking change. I consider this a bugfix actually.

Datseris commented 1 year ago

and yes, using rectangular bin encoding internally also seems like th way to go. reduce code duplication, increase code re-use = best software

kahaaga commented 1 year ago

We don't have to consider this breaking change. I consider this a bugfix actually.

Yes, good point.

and yes, using rectangular bin encoding internally also seems like th way to go. reduce code duplication, increase code re-use = best software

Agreed.

Returning only the leftmost edge seems to me the way to go. It would be more expensive to return both edges.

Ok, then I'll modify GaussianCDFEncoding to return only the left bin edge in #299.