Closed Datseris closed 1 year ago
Changed Shannon(; base)
to
Shannon(base)`
Why? All other EntropyDefinition
s have base
as a keyword. This breaks convention, and breaks keyword propagation if someone is using a combination of Shannon
and some other EntropyDefinition
.
See also #228
Why? All other EntropyDefinitions have base as a keyword. This breaks convention, and breaks keyword propagation if someone is using a combination of Shannon and some other EntropyDefinition.
The overwhelming majority of functions you use do not have keyword arguments without having positional arguments. It's because it is pointless to type base =
when the function accepts a single argument. What will base
be confused with if it is the only given argument? Keyword specification is useful when you have a lot of arguments so you don't have to remember the order. here there is no order. Keywords are also useful when you want to separate the "important" arguments from the ones of "minor importance", which again isn't happening here.
I didn't know that the convention is that base is specified by keywords. If you can use positional arguments I think you should. Renyi has a positional argument version. Positional arugments and default values can exist at the same time, it's not like you must have keywords to achieve this.
The overwhelming majority of functions you use do not have keyword arguments without having positional arguments. It's because it is pointless to type base = when the function accepts a single argument. What will base be confused with if it is the only given argument? Keyword specification is useful when you have a lot of arguments so you don't have to remember the order. here there is no order. Keywords are also useful when you want to separate the "important" arguments from the ones of "minor importance", which again isn't happening here. I didn't know that the convention is that base is specified by keywords. If you can use positional arguments I think you should. Renyi has a positional argument version. Positional arugments and default values can exist at the same time, it's not like you must have keywords to achieve this.
The convention exists here because use Base.@kwdef
for all subtypes of EntropyDefinition
, and we document for each of them that base
is a keyword. All structs defined by the @kwdef
macro automatically also has a positional argument version, so you're free to use that if needed.
But again, if Shannon
is the only EntropyDefinition
that doesn't have the keyword base
, then in any upstream method that accepts an arbitrary log-based EntropyDefinition
, I'd have to write an if statement with one clause for Shannon
, and one clause for the rest, just because one has a different signature and the remaining don't.
But again, if Shannon is the only EntropyDefinition that doesn't have the keyword base, then in any upstream method that accepts an arbitrary log-based EntropyDefinition
I am trying to understand how this would happen. Because as far as I can tell, if there is an option to specify the entropy definition, a user would specify this anyways. Like, give me an example where you, as a developer of an upstream package would have to write this if clause, because I can't imagine it so far. You would allow someone to specify entropy by type instead of by instance? Like to speciy Renyi
instead of Renyi()
? Why? It is well established that you should dispatch on instances of types, not the types themselves. And also, you don't have to care how would Shannon()
be instantiated by dispatchin on instances of types.
The convention exists here because use
Base.@kwdef
for all subtypes of EntropyDefinition, and we document for each of them that base is a keyword. All structs defined by the@kwdef
macro automatically also has a positional argument version, so you're free to use that if needed.
This sounds kinda missleading. We decide what we document and we write the docunentation. @kwdef
doesn't write any docs. I could just as well argue that "we document the positional version, but since things are defined by the @kwdef
macro, you can decide to use the keyword version."
I've fixed the tests, and added a keyword version to Renyi which didn't have.
Merging #227 (c18cc75) into main (f96cd67) will decrease coverage by
0.27%
. The diff coverage is90.47%
.
@@ Coverage Diff @@
## main #227 +/- ##
==========================================
- Coverage 85.04% 84.77% -0.28%
==========================================
Files 45 45
Lines 1137 1136 -1
==========================================
- Hits 967 963 -4
- Misses 170 173 +3
Impacted Files | Coverage Δ | |
---|---|---|
src/complexity_measures/missing_dispersion.jl | 100.00% <ø> (ø) |
|
.../complexity_measures/reverse_dispersion_entropy.jl | 100.00% <ø> (ø) |
|
src/complexity_measures/sample_entropy.jl | 93.54% <ø> (ø) |
|
src/encoding_implementations/fasthist.jl | 100.00% <ø> (ø) |
|
src/encoding_implementations/gaussian_cdf.jl | 61.53% <ø> (ø) |
|
src/encoding_implementations/ordinal_pattern.jl | 80.00% <ø> (ø) |
|
...rc/encoding_implementations/rectangular_binning.jl | 88.52% <ø> (ø) |
|
src/entropies_definitions/curado.jl | 100.00% <ø> (ø) |
|
src/entropies_definitions/kaniadakis.jl | 83.33% <ø> (ø) |
|
src/entropies_definitions/streched_exponential.jl | 100.00% <ø> (ø) |
|
... and 27 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
This sounds kinda missleading. We decide what we document and we write the docunentation.
My point was that we already have documented it (therefore it is Entropies.jl convention) not that this is a principle in general. I just want internal consistency for subtypes of EntropyDefinition
, that's all.
You would allow someone to specify entropy by type instead of by instance? Like to speciy Renyi instead of Renyi()? Why?
There nothing preventing you from doing either: specify something by type, or specify something by instance. I do both regularly. Specifying by type is super useful for doing things like:
function dosomething(T::Typeof{SomeThreeArgumentAbstractType}; kw = 2)
for (i, p1) in enumerate(params1)
for (j, p2) in enumerate(params2)
for L in Ls
for k in ks
p3 = L / k
data = somefunc(p1, L, k) # data are also constructed inside the loop
est = T(; p1, p2, p3, kw)
f(est, data)
# then do something with est
end
end
end
end
end
mytypes = [T, Q, W, R] # all have the same keyword arguments
dosomething.(mytypes)
If not instantiating on the type, this would have to be split into two almost identical for loops/functions. One for constructing the types, and one for constructing the data. Then combining. That seems unecessary.
It is well established that you should dispatch on instances of types, not the types themselves.
The official style guide says "Avoid confusion about whether something is an instance or a type", and to prefer dispatching on instances of types. It is not a strict requirement.
I've fixed the tests, and added a keyword version to
Renyi
which didn't have.
Super! As long as we're consistent for all subtypes of EntropyDefinition
, I'm happy.
@Datseris This looks okay, so just merge when you're done, as long tests pass and the documentation looks good (I did a complete overhaul of the examples yesterday, and I don't want to do it again, so please check the doc preview build before merging; the CI job doesn't fail if example blocks fail to run).
There nothing preventing you from doing either: specify something by type, or specify something by instance
Saying that nothing prevents you of doing something isn't a convincing argument for doing it though. Besides, we're talking design, and doing things one way is simpler than doing them two ways. We don't want the power of doing things both ways. And while it's true that the style guide says only "prefer" this isn't really what happens in reality: almost every ecosystem with usage of packages in other packages has gone the "dispatch on instance" route. So this is settled by the community as far as I can tell and would definitely recommend you to stop doing things both ways even if you do have the power to do so, and just stick with the "dispatch on instance".
If not instantiating on the type, this would have to be split into two almost identical for loops/functions. One for constructing the types, and one for constructing the data. Then combining. That seems unecessary.
This specialized piece of code you pasted does not appear like a package function. It appears as something you would write in your scientific script. There you could easily define a convenience function that creates types with whatever way you would want, and you would still write a single loop. That's how I would do it at least. I definitely wouldn't have to write two loops or two pieces of code like you said. I would have to write at most an extra line of code for each type.
But oh well, in any case, the keywords are available anyways. EDIT: not that they would help you with the example you pasted, because it is unreasonable that different types accept identical keywords. Why would they be different types in the first place? But I guess you could use this to do a loop over the base
keyword that is identical.
@Datseris This looks okay, so just merge when you're done, as long tests pass and the documentation looks good
Ok, I will fix some examples that do not work and then merge.
examples build correctly online. they didn't locally for me because they weren't using main
of chaostools
But I guess you could use this to do a loop over the base keyword that is identical.
Yes, precisely.
almost every ecosystem with usage of packages in other packages has gone the "dispatch on instance" route
Yes, this wouldn't go in any public API, for sure. Totally agree on that.
I would have to write at most an extra line of code for each type.
More lines, more hassle. I find it easier when doing very similar things for very similar types (that I know have the same keyword arguments). It wasn't something I used to do before. But after our little advent our dispatch-on-type here, I've become somewhat addicted to it. I find it very easy to have a mental model of what's going on when using this approach. Let's see if its possible to detox :D
Shannon(; base)
toShannon(base)