JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
53 stars 12 forks source link

Wrap up PR for v2 #227

Closed Datseris closed 1 year ago

Datseris commented 1 year ago
kahaaga commented 1 year ago

Changed Shannon(; base)toShannon(base)`

Why? All other EntropyDefinitions have base as a keyword. This breaks convention, and breaks keyword propagation if someone is using a combination of Shannon and some other EntropyDefinition.

See also #228

Datseris commented 1 year ago

Why? All other EntropyDefinitions have base as a keyword. This breaks convention, and breaks keyword propagation if someone is using a combination of Shannon and some other EntropyDefinition.

The overwhelming majority of functions you use do not have keyword arguments without having positional arguments. It's because it is pointless to type base = when the function accepts a single argument. What will base be confused with if it is the only given argument? Keyword specification is useful when you have a lot of arguments so you don't have to remember the order. here there is no order. Keywords are also useful when you want to separate the "important" arguments from the ones of "minor importance", which again isn't happening here.

I didn't know that the convention is that base is specified by keywords. If you can use positional arguments I think you should. Renyi has a positional argument version. Positional arugments and default values can exist at the same time, it's not like you must have keywords to achieve this.

kahaaga commented 1 year ago

The overwhelming majority of functions you use do not have keyword arguments without having positional arguments. It's because it is pointless to type base = when the function accepts a single argument. What will base be confused with if it is the only given argument? Keyword specification is useful when you have a lot of arguments so you don't have to remember the order. here there is no order. Keywords are also useful when you want to separate the "important" arguments from the ones of "minor importance", which again isn't happening here. I didn't know that the convention is that base is specified by keywords. If you can use positional arguments I think you should. Renyi has a positional argument version. Positional arugments and default values can exist at the same time, it's not like you must have keywords to achieve this.

The convention exists here because use Base.@kwdef for all subtypes of EntropyDefinition, and we document for each of them that base is a keyword. All structs defined by the @kwdef macro automatically also has a positional argument version, so you're free to use that if needed.

But again, if Shannon is the only EntropyDefinition that doesn't have the keyword base, then in any upstream method that accepts an arbitrary log-based EntropyDefinition, I'd have to write an if statement with one clause for Shannon, and one clause for the rest, just because one has a different signature and the remaining don't.

Datseris commented 1 year ago

But again, if Shannon is the only EntropyDefinition that doesn't have the keyword base, then in any upstream method that accepts an arbitrary log-based EntropyDefinition

I am trying to understand how this would happen. Because as far as I can tell, if there is an option to specify the entropy definition, a user would specify this anyways. Like, give me an example where you, as a developer of an upstream package would have to write this if clause, because I can't imagine it so far. You would allow someone to specify entropy by type instead of by instance? Like to speciy Renyi instead of Renyi()? Why? It is well established that you should dispatch on instances of types, not the types themselves. And also, you don't have to care how would Shannon() be instantiated by dispatchin on instances of types.

The convention exists here because use Base.@kwdef for all subtypes of EntropyDefinition, and we document for each of them that base is a keyword. All structs defined by the @kwdef macro automatically also has a positional argument version, so you're free to use that if needed.

This sounds kinda missleading. We decide what we document and we write the docunentation. @kwdef doesn't write any docs. I could just as well argue that "we document the positional version, but since things are defined by the @kwdef macro, you can decide to use the keyword version."

Datseris commented 1 year ago

I've fixed the tests, and added a keyword version to Renyi which didn't have.

codecov[bot] commented 1 year ago

Codecov Report

Merging #227 (c18cc75) into main (f96cd67) will decrease coverage by 0.27%. The diff coverage is 90.47%.

@@            Coverage Diff             @@
##             main     #227      +/-   ##
==========================================
- Coverage   85.04%   84.77%   -0.28%     
==========================================
  Files          45       45              
  Lines        1137     1136       -1     
==========================================
- Hits          967      963       -4     
- Misses        170      173       +3     
Impacted Files Coverage Δ
src/complexity_measures/missing_dispersion.jl 100.00% <ø> (ø)
.../complexity_measures/reverse_dispersion_entropy.jl 100.00% <ø> (ø)
src/complexity_measures/sample_entropy.jl 93.54% <ø> (ø)
src/encoding_implementations/fasthist.jl 100.00% <ø> (ø)
src/encoding_implementations/gaussian_cdf.jl 61.53% <ø> (ø)
src/encoding_implementations/ordinal_pattern.jl 80.00% <ø> (ø)
...rc/encoding_implementations/rectangular_binning.jl 88.52% <ø> (ø)
src/entropies_definitions/curado.jl 100.00% <ø> (ø)
src/entropies_definitions/kaniadakis.jl 83.33% <ø> (ø)
src/entropies_definitions/streched_exponential.jl 100.00% <ø> (ø)
... and 27 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

kahaaga commented 1 year ago

This sounds kinda missleading. We decide what we document and we write the docunentation.

My point was that we already have documented it (therefore it is Entropies.jl convention) not that this is a principle in general. I just want internal consistency for subtypes of EntropyDefinition, that's all.

You would allow someone to specify entropy by type instead of by instance? Like to speciy Renyi instead of Renyi()? Why?

There nothing preventing you from doing either: specify something by type, or specify something by instance. I do both regularly. Specifying by type is super useful for doing things like:

function dosomething(T::Typeof{SomeThreeArgumentAbstractType}; kw = 2)
  for (i, p1) in enumerate(params1)
      for (j, p2) in enumerate(params2)
          for L in Ls
              for k in ks
                  p3 = L / k
                  data = somefunc(p1, L, k) # data are also constructed inside the loop
                  est = T(; p1, p2, p3, kw)
                  f(est, data)
                  # then do something with est
              end
          end
      end
  end
end
mytypes = [T, Q, W, R] # all have the same keyword arguments
dosomething.(mytypes)

If not instantiating on the type, this would have to be split into two almost identical for loops/functions. One for constructing the types, and one for constructing the data. Then combining. That seems unecessary.

It is well established that you should dispatch on instances of types, not the types themselves.

The official style guide says "Avoid confusion about whether something is an instance or a type", and to prefer dispatching on instances of types. It is not a strict requirement.

I've fixed the tests, and added a keyword version to Renyi which didn't have.

Super! As long as we're consistent for all subtypes of EntropyDefinition, I'm happy.

kahaaga commented 1 year ago

@Datseris This looks okay, so just merge when you're done, as long tests pass and the documentation looks good (I did a complete overhaul of the examples yesterday, and I don't want to do it again, so please check the doc preview build before merging; the CI job doesn't fail if example blocks fail to run).

Datseris commented 1 year ago

There nothing preventing you from doing either: specify something by type, or specify something by instance

Saying that nothing prevents you of doing something isn't a convincing argument for doing it though. Besides, we're talking design, and doing things one way is simpler than doing them two ways. We don't want the power of doing things both ways. And while it's true that the style guide says only "prefer" this isn't really what happens in reality: almost every ecosystem with usage of packages in other packages has gone the "dispatch on instance" route. So this is settled by the community as far as I can tell and would definitely recommend you to stop doing things both ways even if you do have the power to do so, and just stick with the "dispatch on instance".

If not instantiating on the type, this would have to be split into two almost identical for loops/functions. One for constructing the types, and one for constructing the data. Then combining. That seems unecessary.

This specialized piece of code you pasted does not appear like a package function. It appears as something you would write in your scientific script. There you could easily define a convenience function that creates types with whatever way you would want, and you would still write a single loop. That's how I would do it at least. I definitely wouldn't have to write two loops or two pieces of code like you said. I would have to write at most an extra line of code for each type.

But oh well, in any case, the keywords are available anyways. EDIT: not that they would help you with the example you pasted, because it is unreasonable that different types accept identical keywords. Why would they be different types in the first place? But I guess you could use this to do a loop over the base keyword that is identical.


@Datseris This looks okay, so just merge when you're done, as long tests pass and the documentation looks good

Ok, I will fix some examples that do not work and then merge.

Datseris commented 1 year ago

examples build correctly online. they didn't locally for me because they weren't using main of chaostools

kahaaga commented 1 year ago

But I guess you could use this to do a loop over the base keyword that is identical.

Yes, precisely.

almost every ecosystem with usage of packages in other packages has gone the "dispatch on instance" route

Yes, this wouldn't go in any public API, for sure. Totally agree on that.

I would have to write at most an extra line of code for each type.

More lines, more hassle. I find it easier when doing very similar things for very similar types (that I know have the same keyword arguments). It wasn't something I used to do before. But after our little advent our dispatch-on-type here, I've become somewhat addicted to it. I find it very easy to have a mental model of what's going on when using this approach. Let's see if its possible to detox :D