JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
56 stars 14 forks source link

Deprecate ordinal outcome space constructors #375

Closed kahaaga closed 8 months ago

kahaaga commented 8 months ago

Fixes #372.

I also parameterized the τ field of the structs explicitly. These were previously hard-coded as Int, but there's no reason why we should be so strict.

codecov[bot] commented 8 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (3bbfe0b) 89.08% compared to head (ed76642) 89.25%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #375 +/- ## ========================================== + Coverage 89.08% 89.25% +0.16% ========================================== Files 79 79 Lines 2226 2261 +35 ========================================== + Hits 1983 2018 +35 Misses 243 243 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Datseris commented 8 months ago

These were previously hard-coded as Int, but there's no reason why we should be so strict.

I disagree with that. I would argue, there is no reason for this to be parameterized. Wyy increase source code complexity? I think the correct solution here is to simply annotate τ::Integer in its keyword declaration and τ::Int in its field declaration. This way, any integer could be used as input, but we just use Int in the type, because there is no reason to expect to use anything else.

kahaaga commented 8 months ago

This way, any integer could be used as input, but we just use Int in the type

Int defaults to Int32 or Int64 depending on the system architecture. Depending on the application, a user may want to use something else, e.g. Int16 or Int8 for performance gains when they know the range of possible encoded values. By hardcoding Int, we prevent this from being possible. I argue that we should be as generic as possible. Just like we parameterize on T <: Real and not e.g.T <: Float64, we should parameterize on T <: Integer, and not e.g. T <: Int64. Unless there are very good reasons for not doing so, that is - which I don't see that there is here

EDIT: Example: I'm trying out running Julia on microcontrollers for some experiments, where being able to to things like I describe above is crucial.

Datseris commented 8 months ago

. Depending on the application, a user may want to use something else, e.g. Int16 or Int8 for performance gains when they know the range of possible encoded values.

I think we are overengineering stuff. τ is just the delay time. There aren't any performance gains in giving different types as the usage of τ in code is trivial anyways. I do not understand the argument:

I'm trying out running Julia on microcontrollers for some experiments, where being able to to things like I describe above is crucial.

the other field of the type is a function. Doesn't this go well beyond 32 or 64 bytes? If your concern is microcontrollers, then the whole type has to be re-worked to not have the random is less function at all as a field.

kahaaga commented 8 months ago

τ is just the delay time.

Yep to the latter, and there no reason why a user shouldn't be able to supply any Integer for the delay time, not just Int32 or Int64.

I think we are overengineering stuff. τ is just the delay time. There aren't any performance gains in giving different types as the usage of τ in code is trivial anyways

If my time series are a maximum of 300 points long, and I know I will probably never need to store a τ with a value higher than, say, 50 for the delay . Then I'd use a UInt8 for that. If creating millions of instances of OrdinalPatterns, that would start to matter on systems with limited resources.

My point is that we shouldn't prevent users from optimizing their code by hard-coding which types of numbers they are allowed to use, unless it is necessary to ensure the correctness of the algorithm. This goes not only for the ordinal outcome spaces, but also for all other types we've defined.

Why increase source code complexity?

I don't get the argument that using struct MyStruct{I <: Integer}; a::I; end is so complicated. It's standard Julia syntax for structs, and ensures complete generality of the codebase. We use parameterizations all over the codebase both for efficiency and code readability, so I don't see why we wouldn't do so here.

the other field of the type is a function. Doesn't this go well beyond 32 or 64 bytes? If your concern is microcontrollers, then the whole type has to be re-worked to not have the random is less function at all as a field.

I don't know how to measure function memory footprint in Julia, but you're probably right: The function goes well beyond that in terms of storage. And yes, perhaps the function should not be a field if performance is to be optimal, but that is a different issue than the issue of unecessarily restricting input types.

Datseris commented 8 months ago

okaay, we can keep the type parameterzation

kahaaga commented 8 months ago

This needs an approval before merging.