JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
53 stars 12 forks source link

New DiffEntropyEst interface #239

Closed Datseris closed 1 year ago

Datseris commented 1 year ago

Closes #238

Datseris commented 1 year ago

@kahaaga the source has a method

entropy(est::DiffEntropyEst, x::AbstractVector) =
    entropy(est, Dataset(x))

which the nearest neighbor methods use. But why does this method exist? If the docstring say that the estimator works for Dataset input, that's what it should do. Here it works for both Dataset and timeseries.

Datseris commented 1 year ago

I solved this by makin one more abstract type for NN estiamtors that has this dispatch.

kahaaga commented 1 year ago

The source code has a method entropy(est::DiffEntropyEst, x::AbstractVector) = entropy(est, Dataset(x)) Here it works for both Dataset and timeseries.

The NN estimators works on N-dimensional data. Time series are 1D data, so automatically convert to a 1-D dataset.

I solved this by making one more abstract type for NN estimators that has this dispatch.

Makes sense.

Datseris commented 1 year ago

I am done here, but oddly, the AlizadehArghami test for base 3 fails. I am trying to figure out what I did wrong, where I misstyped, but I can't see it. Maybe you can?

kahaaga commented 1 year ago

I am done here, but oddly, the AlizadehArghami test for base 3 fails. I am trying to figure out what I did wrong, where I misstyped, but I can't see it. Maybe you can?

Let me have a quick look.

On a sidenote, oddly, I also got some weird errors yesterday when comparing the contingency-table based CMI computation vs the CountOccurrences() estimator on some discrete data. The CountOccurrences() method calls entropy, while the contingency table approach just called the regular log. For base-2, base-e, and base-10 I was getting identical answers, but for any other base, I was getting different answers. When switching to ComplexityMeasures.log_with_base, however, answers were identical. I still don't get why. It might be completely unrelated, though.

kahaaga commented 1 year ago
function entropy(est::AlizadehArghami, x::AbstractVector{<:Real})
    (; m) = est
    n = length(x)
    m < floor(Int, n / 2) || throw(ArgumentError("Need m < length(x)/2."))
    #h = entropy(Shannon(base = ℯ), Vasicek(; m), x) + (2 / n)*(m * log(2))
    #return h / log(e.base, ℯ)
    h = entropy(Vasicek(; m, base = est.base), x) + (2 / n)*(m * log(2))
    return h / log(est.base, ℯ)
end

I don't get why the tests would suddenly fail. The commented out line is exactly equivalent to the new lines. Maybe the allowed error bounds for the tests are too tight (we're using randomly generated data)? Have you check how far off the tests are?

kahaaga commented 1 year ago

I am done here

Please have a look at my comments in #238. I highly recommend that we don't commit to the changes yet. We need to plan this a bit more carefully.

That said, if this will be the way we go, why do you make base a property of the DiffEntEstimator? The base is a property of the entropy, not the estimator. For discrete entropies, we take the base from the entropy. Why would we do separate things for discrete/differential entropy?

kahaaga commented 1 year ago

I am done here, but oddly, the AlizadehArghami test for base 3 fails. I am trying to figure out what I did wrong, where I misstyped, but I can't see it. Maybe you can?

Yeah, the bounds I've defined in the tests are probably just a bit tight on the lower end. I just got the following error while addressing #240.

Lord and all the order-statistics based estimators are negatively biased (but we can't determine the magnitude of this bias in general). So I'll just adjust increase the allowed negative deviation from the true entropy value slightly.

Lord: Test Failed at /Users/work/Code/Repos/Temp/Entropies.jl/test/entropies/estimators/lord.jl:18
  Expression: N_base3 * 0.98 ≤ ea_n3 ≤ N_base3 * 1.02
   Evaluated: 1.5288 ≤ 1.521140237077653 ≤ 1.5912000000000002
Datseris commented 1 year ago

@kahaaga I've finished here implementing what we've discussed in the video call. Now discrete and differential entropies have the same interface in the following sense:

an entropy estimator exists for both. One can only give the entropy estimator to entropy. It is not possible to call entropy(::EntropyDefinition, ::EntropyEstimator, ...). If estimators allow for more than one definitions, they take in as argument a subtype of EntropyDefinition.

For convenience and backwards compatibility it is allowed to provide an EntropyDefinition as a DiscreteEntropyEstimator. In this case, the estimator is assuemed MLEntropy. I hope that's a good name.

This is not a breaking change. All code that was working before will work now as well. Code that errored before (e.g. calling differential entropy estimators with renyi definition), will error now as well, jsut with a different error: a method error rather than an argument error.

I still have to do some documentation polish and fix tests but the core is done I think.

kahaaga commented 1 year ago

Ok, so I'll try to summarize the changes here:

Is this correct?

Datseris commented 1 year ago

Yes with only exception that the last bullet point is entropy(e, est, x) = entropy(MLEntropy(e), est, x). (you forgot est).

I'll finish tests and doc and merge and tag 2.2 then if no other objections arise.

EDIT: To clarify, in the source code the opposite happens: internally the dispatch is still entropy(e::EntropyDefinition, p::probabilities) and MLEntropy reduces to the latter. But that's just coz I didn't see a reason at the moment to change these internals. When a new estimator comes one can anyways define entropy(e::NewEstimator, ...).

kahaaga commented 1 year ago

I'll finish tests and doc and merge and tag 2.2 then if no other objections arise.

I don't have any objections to the proposed changes here. However, I can't really say anything about the larger picture when it comes to the more complicated methods in CausalityTools. But as long as this is backwards-compatible, there should be no issue, and everything upstream should work without any changes.