Closed Datseris closed 1 year ago
@kahaaga the source has a method
entropy(est::DiffEntropyEst, x::AbstractVector) =
entropy(est, Dataset(x))
which the nearest neighbor methods use. But why does this method exist? If the docstring say that the estimator works for Dataset
input, that's what it should do. Here it works for both Dataset
and timeseries.
I solved this by makin one more abstract type for NN estiamtors that has this dispatch.
The source code has a method
entropy(est::DiffEntropyEst, x::AbstractVector) = entropy(est, Dataset(x))
Here it works for both Dataset and timeseries.
The NN estimators works on N-dimensional data. Time series are 1D data, so automatically convert to a 1-D dataset.
I solved this by making one more abstract type for NN estimators that has this dispatch.
Makes sense.
I am done here, but oddly, the AlizadehArghami
test for base 3 fails. I am trying to figure out what I did wrong, where I misstyped, but I can't see it. Maybe you can?
I am done here, but oddly, the AlizadehArghami test for base 3 fails. I am trying to figure out what I did wrong, where I misstyped, but I can't see it. Maybe you can?
Let me have a quick look.
On a sidenote, oddly, I also got some weird errors yesterday when comparing the contingency-table based CMI computation vs the CountOccurrences()
estimator on some discrete data. The CountOccurrences()
method calls entropy
, while the contingency table approach just called the regular log
. For base-2, base-e, and base-10 I was getting identical answers, but for any other base, I was getting different answers. When switching to ComplexityMeasures.log_with_base
, however, answers were identical. I still don't get why. It might be completely unrelated, though.
function entropy(est::AlizadehArghami, x::AbstractVector{<:Real})
(; m) = est
n = length(x)
m < floor(Int, n / 2) || throw(ArgumentError("Need m < length(x)/2."))
#h = entropy(Shannon(base = ℯ), Vasicek(; m), x) + (2 / n)*(m * log(2))
#return h / log(e.base, ℯ)
h = entropy(Vasicek(; m, base = est.base), x) + (2 / n)*(m * log(2))
return h / log(est.base, ℯ)
end
I don't get why the tests would suddenly fail. The commented out line is exactly equivalent to the new lines. Maybe the allowed error bounds for the tests are too tight (we're using randomly generated data)? Have you check how far off the tests are?
I am done here
Please have a look at my comments in #238. I highly recommend that we don't commit to the changes yet. We need to plan this a bit more carefully.
That said, if this will be the way we go, why do you make base
a property of the DiffEntEstimator
? The base is a property of the entropy, not the estimator. For discrete entropies, we take the base from the entropy. Why would we do separate things for discrete/differential entropy?
I am done here, but oddly, the AlizadehArghami test for base 3 fails. I am trying to figure out what I did wrong, where I misstyped, but I can't see it. Maybe you can?
Yeah, the bounds I've defined in the tests are probably just a bit tight on the lower end. I just got the following error while addressing #240.
Lord
and all the order-statistics based estimators are negatively biased (but we can't determine the magnitude of this bias in general). So I'll just adjust increase the allowed negative deviation from the true entropy value slightly.
Lord: Test Failed at /Users/work/Code/Repos/Temp/Entropies.jl/test/entropies/estimators/lord.jl:18
Expression: N_base3 * 0.98 ≤ ea_n3 ≤ N_base3 * 1.02
Evaluated: 1.5288 ≤ 1.521140237077653 ≤ 1.5912000000000002
@kahaaga I've finished here implementing what we've discussed in the video call. Now discrete and differential entropies have the same interface in the following sense:
an entropy estimator exists for both. One can only give the entropy estimator to entropy
. It is not possible to call entropy(::EntropyDefinition, ::EntropyEstimator, ...)
. If estimators allow for more than one definitions, they take in as argument a subtype of EntropyDefinition
.
For convenience and backwards compatibility it is allowed to provide an EntropyDefinition
as a DiscreteEntropyEstimator
. In this case, the estimator is assuemed MLEntropy
. I hope that's a good name.
This is not a breaking change. All code that was working before will work now as well. Code that errored before (e.g. calling differential entropy estimators with renyi definition), will error now as well, jsut with a different error: a method error rather than an argument error.
I still have to do some documentation polish and fix tests but the core is done I think.
Ok, so I'll try to summarize the changes here:
DiscreteEntropyEstimator
is now the supertype of all discrete entropy estimators. MLEntropy <: DiscreteEntropyEstimator
.MillerMadowEntropy <: DiscreteEntropyEstimator
.entropy(e, est, x)
becomes entropy(est, e)
.entropy(e, est, x) = entropy(MLEntropy(e), x)
to maintain backwards compatibility.Is this correct?
Yes with only exception that the last bullet point is entropy(e, est, x) = entropy(MLEntropy(e), est, x)
. (you forgot est
).
I'll finish tests and doc and merge and tag 2.2 then if no other objections arise.
EDIT: To clarify, in the source code the opposite happens: internally the dispatch is still entropy(e::EntropyDefinition, p::probabilities)
and MLEntropy
reduces to the latter. But that's just coz I didn't see a reason at the moment to change these internals. When a new estimator comes one can anyways define entropy(e::NewEstimator, ...)
.
I'll finish tests and doc and merge and tag 2.2 then if no other objections arise.
I don't have any objections to the proposed changes here. However, I can't really say anything about the larger picture when it comes to the more complicated methods in CausalityTools. But as long as this is backwards-compatible, there should be no issue, and everything upstream should work without any changes.
Closes #238