Remove method `entropy([e::Entropy,] est::EntropyEstimator, x)`

Datseris commented 1 year ago

I am really unhappy about this method. There are so many things wrong with it:

It is exposed at the same level as the more-likely-to-be-used entropy([e::Entropy,] est::ProbabilitiesEstimator, x).
It is genuinely a different thing then the method above.
Most "entropy estimators" do not work with any entropy definition, by their own definition.

From the table, pretty much none of them work with any entropy, which makes the table itself useless:

(even after he correction taht kraskov works with a couple more)

lastly, why do we even care about allowing these rather specialized estimators to work with "any" entropy definition.

I propose to completely remove this method in favor of a new function

diffentropy(a::EntropyEstimator, x)

notice the absence of e::Entropy all together. Why? e::Entropy becomes a field of the few EntropyEstimators that can actually work for a different kind of entropy. This also fixes the problem that we imply, at the top level API point, that this generality exists. It doesn't even exist scientifically, let alone wether we could implement it in a software.

Datseris commented 1 year ago

Also, a question, is this really big disstinction you do here bewtween continuous/differential and discrete entropy also being done in the literature, in the papers you've used to implement all these differential entropy methods?

kahaaga commented 1 year ago

Also, a question, is this really big disstinction you do here between continuous/differential and discrete entropy also being done in the literature, in the papers you've used to implement all these differential entropy methods?

Yes, they are different quantities altogether, but are related in the limit of infinitely fine partitions. See for example chapter 8 in Elements of Information Theory (Cover & Thomas).

Discrete entropy is the entropy of a discrete random variable. Differential entropy is the entropy of a continuous random variable. The first is a sum of probabilities. The second is an integral of a density function. Discrete entropy is guaranteed to to be >= 0, differential entropy is not.

All the currently implemented EntropyEstimators deal explicitly with differential entropy, as is clearly stated in the papers where they are presented. Some papers may discuss a connection to discrete entropy, while for example for the order-statistics-based estimators, there is zero mention of discrete entropy, only differential entropy.

There is also a body of literature on corrections to the discrete entropy for it to approximate the differential entropy (which we haven't touched on yet at all here).

It is exposed at the same level as the more-likely-to-be-used ...

I don't agree with this statement. It depends on the context. Differential entropy estimators are at the core of many transfer entropy estimators, for example. Another example: I recently got a paper rejected on a certain entropy-based information estimator, partly on the grounds of not showing the convergence of the estimator to the true (differential) measure.

lastly, why do we even care about allowing these rather specialized estimators to work with "any" entropy definition.

I've answered this in a separate issue/PR somewhere. Certain EntropyEstimator can compute different differential entropies (not any of those implemented here atm, but I have working implementations already)

It is genuinely a different thing then the method above.

Yes, the quantity which a ProbabilitiesEstimator estimates (discrete entropy) is different from the quantity which an EntropyEstimator estimates (differential entropy). But both differential entropy and discrete entropy is an entropy which is computed using an estimator.

Therefore entropy(e::Entropy, est::Union{ProbabilitiesEstimator, EntropyEstimator}, x) is very natural. A ProbabilitiesEstimator estimates the discrete version, while an EntropyEstimator estimates the continuous version.

There are even estimators that compute discrete-continuous entropy, if the data are multivariate and a mix of continuous and discrete variables.

EDIT: Then we'd have:

entropy(e::Entropy, est::ProbabilitiesEstimator
diffentropy(e::Entropy, est::EntropyEstimator)
contdiffentropy(e::Entropy, est::ContDiffEntropyEstimator)

I don't think this makes sense at all. The one we have is already super-intuitive, and simply dispatches on different estimators whose type make it very clear what is computed.

A deeper insight: density estimation API

Actually, what we now call EntropyEstimator at their core estimate density functionals. Thus, they could in principle be called DensityEstimator (an un-normalized KernelDensity would also be a density estimator). Each estimator estimates densities in a particular way, with a particular bias correction.

One solution could therefore be to make a completely new density estimation API, just as we do for probabilities. However, at the moment, I neither have the insight to extract and isolate the density estimation part for all the EntropyEstimators we have, the time to obtain that insight, nor the time to start working on something completely new now.

A compromise: renaming `EntropyEstimator` to `DifferentialEntropyEstimator`

This one is obvious, and perfectly aligns with the existing entropy(e::Entropy, est, x) signature. If est is a ProbabilitiesEstimator, then (non-bias corrected, at the moment) discrete entropy is computed. If est is a DifferentialEntropyEstimator, then (possibly bias corrected) differential entropy is computed. Would this make sense?

kahaaga commented 1 year ago

This also fixes the problem that we imply, at the top level API point, that this generality exists. It doesn't even exist scientifically,

Which generality, specifically, are you referring to?

kahaaga commented 1 year ago

From the table, pretty much none of them work with any entropy, which makes the table itself useless:

I disagree. The table is still useful, because it not only shows which estimator is compatible with which type of differential entropy, but also what type of input data is used and and concepts it is based on.

This argument could also be made about the probabilities estimator table, just with the opposite sign. Since probabilities estimators work with all Entropy types, we can just drop the table, since it doesn't give any new information.

The table is there for a quick overview for new users, not as a reflection of the deeper philosophical implications of our particular choice of API/implementation, so I don't think this argument should count against the entropy(e::Entropy, est::EntropyEstimator, x) signature. Currently, we only have Shannon differential entropy estimators, so the table seems a bit over the top. But that will change.

EDIT: The tables are also a way for users to get a literature overview. If a user sees that the Kraskov estimator hasn't been used for Renyi differential entropy estimation yet, then perhaps "aha! I can make a contribution here!".

kahaaga commented 1 year ago

To summarize my rambling:

In a dream world (perhaps Entropy v3), we could make a density estimation API to replace EntropyEstimator. But I'd rather keep everything exactly as is, just perhaps renaming EntropyEstimator to DifferentialEntropyEstimator.

Again, the number of functions and different signatures we make here cascade to the upstream methods. Assuming we want a consistent ecosystem, then I'd need separate signatures for transferentropy_discrete, transferentropy_differential, transferentropy_discretediff, mutualinfo_discrete, mutualinfo_differential, mutualinfo_discretediff. The list goes on. This many-method-for-related-concepts approach increases complexity for reasons that, in my opinion, do not outweight the advantages of the estimator-controls-concept approach.

In my opinion, it is is so much easier to just use one word to describe a concept (here: "entropy"), and let the different estimators control which variant is computed.

And a meta-summary: besides all these points, it also boils down to time and resources. I don't have time to do multiple iterations of development and API design discussion at the moment, and this change would massively complicate upstreams functions in CausalityTools, which I don't have time to do even more re-writes of.

We should stick to the current version, release and see how it works in practice. Any major changes now should go into a future v3.

kahaaga commented 1 year ago

lastly, why do we even care about allowing these rather specialized estimators to work with "any" entropy definition.

A concrete example for why we should allow these specialized estimators to work with any entropy. The Vasicek, Correa and more of the order statistic estimators are straightforwardly extendable to Renyi entropy, for example.

We just haven't implemented all these versions yet. That shouldn't count as an argument against allowing flexibility in the first place.

EDIT: the linked paper is also a good example of exclusively estimating differential entropy, not discrete entropy.

Datseris commented 1 year ago

Yes, they are different quantities altogether, but are related in the limit of infinitely fine partitions. See for example chapter 8 in Elements of Information Theory (Cover & Thomas).

I know the difference between discrete and differential entropies are, this wasn't my question. My question was whether anyone makes a big deal out of separating discrete and differential entropies in the NLD literature, given that both approximate the same thing. For example, take the Vasicek estimator that I just saw in the docs.

Well, when I call entropy(ValueHistogram, x) I also approximate the in reality continuous density on the attractor and then get the entropy of that continuous density, but now discretized due to the nature of not having infinite data. And the approach above very obviously also discretizes things and then sums up things. So, what makes my ValueHistogram based entropy a discrete one, and your Vasicek entropy a "continuous or differentiable" one? Aren't both discrete things in practice?

In my opinion, it is is so much easier to just use one word to describe a concept (here: "entropy"), and let the different estimators control which variant is computed.

Definitely disagree here, because this simple concept is actually a very difficult concept, otherwise this issue wouldn't have a discussion several A4 pages of text long. But, from what I take, the reason you want to have a single function is for downstream packages, right? You want to have something like entropy(definition, estimator, x) and to be use this everywhere and forever regartdless if the return value is a discrete entropy or a differential entropy.

The solution I can agree on is to keep entropy as is now, but completely separate the method entpopy(definition, EntropyEstimator, x). Definitely not in the same docstring, nor the same docs section. Dedicated section for differential. Furthermore, to rename the type Entropy to EntropyDefinition. Because in fact the strcuts like Renyi() aren't entropies in the sense that they give you the quantity; they are definitions of how to get the quantity from a distribution.

I've answered this in a separate issue/PR somewhere. Certain EntropyEstimator can compute different differential entropies (not any of those implemented here atm, but I have working implementations already)

Okay, convinced for allowing this. The table can stay.

rename to DifferentialEntropyEstimator

Oh yes I have already done that in the PR.

kahaaga commented 1 year ago

My question was whether anyone makes a big deal out of separating discrete and differential entropies in the NLD literature, given that both approximate the same thing.

I can't answer from the top of my head what proportion of the literature distinguishes between discrete and continuous entropy like that. That is a question worth investigating in itself. However, for all the EntropyEstimators we have, and for the ones that I've got pending, they explicitly only talk about differential entropy.

So, what makes my ValueHistogram based entropy a discrete one, and your Vasicek entropy a "continuous or differentiable" one?

The theoretical properties of the theoretical concepts they represent are different. One is a sum over a probability mass function, one is an integral over a density function.

Aren't both discrete things in practice?

The estimation of any quantity represented by a continuous-valued function will necessarily have to be estimated discretely. We're estimating the theoretical concept "continuous" with finite-precision data on a computer, which can't represent "continuous".

However, it goes beyond that. The theoretical properties of the discrete entropy and the differential entropy are different. These properties can be used to prove convergence of an estimator to the theoretical value, for example. But these properties are not identical for the discrete and continuous entropy.

We say on the front page:

In Entropies.jl, we provide the generic function entropy that tries to both clarify the disparate "entropy concepts", while unifying them under a common interface that highlights the modular nature of the word "entropy".

If splitting the entropy into different functions, then we no longer unify different entropy concepts under a common interface.

But, from what I take, the reason you want to have a single function is for downstream packages, right? You want to have something like entropy(definition, estimator, x) and to be use this everywhere and forever regartdless if the return value is a discrete entropy or a differential entropy.

Yes, precisely. Upstream methods don't care about whether the result is theoretically a differential entropy or a discrete entropy, or a discrete-differential entropy. It just uses the returned value for something.

The solution I can agree on is to keep entropy as is now, but completely separate the method entropy(definition, EntropyEstimator, x). Definitely not in the same docstring, nor the same docs section. Dedicated section for differential.

I agree. Separating the documentation completely for differential entropy is a good idea.

Furthermore, to rename the type Entropy to EntropyDefinition. Because in fact the strcuts like Renyi() aren't entropies in the sense that they give you the quantity; they are definitions of how to get the quantity from a distribution.

I agree. Dispatching on definitions of information measures is the approach I've taken in CausalityTools too, so no major changes would be needed upstream.

Datseris commented 1 year ago

perfect, i have done all already. I will return to the theoretical discussion of what is discrete and what is continuous later. For now it is of no importance: if the papers explicitly say they compute differential, we say the same, no doubt here.

kahaaga commented 1 year ago

I will return to the theoretical discussion of what is discrete and what is continuous later. For now it is of no importance: if the papers explicitly say they compute differential, we say the same, no doubt here.

Yes, we should definitely keep this in mind. I think it will be of huge educational value to provide some unified explanation of discrete/differential entropy, or clearly explain how they differ, if/how/when they do in the context of practical estimation.

kahaaga commented 1 year ago

perfect, i have done all already. I will return to the theoretical discussion of what is discrete and what is continuous later. For now it is of no importance: if the papers explicitly say they compute differential, we say the same, no doubt here.

Ok, sweet! Are you done with all that you wanted to address for #209?

Datseris commented 1 year ago

By the way have you read this https://en.wikipedia.org/wiki/Differential_entropy which says that

Differential entropy (described here) is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.

I have seen this many years ago and hoped that it doesn't matter in pracitce.

kahaaga commented 1 year ago

By the way have you read this ...

Yes, I've seen that.

The issue is further complicated by the fact that the other differential entropy based quantities (e.g. Tsallis-based) are not necessarily "well-founded", in the sense that they share the same properties as the discrete version of the entropy. That is, the relationship between the discrete/differential entropy for e.g. Tsallis entropy is not necessarily the same as for Shannon entropy. Or it might be. I don't know, from the top of my head.

We'll need to consider this thoroughly.

I'm not sure for which differential entropy definitions there are robust connections, and where there are no such connections, but the issue is prevalent for higher-level measures such as the mutual information. For example, there are at least three versions of Rényi mutual information, and it is a mess to keep track of it. That's why I want the base package to be as simple as possible (EDIT: syntax-wise), so it becomes easier to manage the cluster**** that is the family of other generalized entropy-based information measures.

Anyways, I opened a separate issue for this discussion (referring here), so we can continue there, so comments don't end up all over the place

JuliaDynamics / ComplexityMeasures.jl