JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
53 stars 12 forks source link

Release announcement draft #250

Closed Datseris closed 1 year ago

Datseris commented 1 year ago

I'm drafting a release announcement here. Comments posted will be incorporated into this top level post

ComplexityMeasures.jl (Entropies.jl successor)

I'm incredibly proud to announce ComplexityMeasures.jl, which I believe is one of the most well-thought out packages in JuliaDynamics and one of the most well-thought out packages in the whole of nonlinear dynamics. (wow big statements!)

https://juliadynamics.github.io/ComplexityMeasures.jl/stable/

Intro

ComplexityMeasures.jl contains estimators for probabilities, entropies, and other complexity measures derived from observations in the context of nonlinear dynamics and complex systems. It is the successor of the previous Entropies.jl package (which was never formally announced). We believe that ComplexityMeasures.jl is the "best" (most featureful, most extendable, most tested, fastest) open source code base for computing entropies and/or complexity measures out there. We won't offer concrete proof for this statement yet, but we are writing a paper on it, and once we have a preprint I will link it here.

Content

ComplexityMeasures.jl is a practical attempt at unifying the concepts of probabilities, entropies, and complexity measures. We (@kahaaga and @datseris) have spent several months designing a composable, modular, extendable interface that is capable of computing as many different variants of "entropy" or "complexity" as one can find in the literature.

The package first defines a generic interface for estimating probabilities out of input data. Each probability (defined by a ProbabilitiesEstimator subtype) also defines an outcome space, and functions exist to compute the probabilities and their outcomes, as well as other convenience calculations like the size of the outcome space or the missing outcomes. There are already a plethora of probabilities estimators:

Estimator Principle Input data
CountOccurrences Count of unique elements Any
ValueHistogram Binning (histogram) Vector, StateSpaceSet
TransferOperator Binning (transfer operator) Vector, StateSpaceSet
NaiveKernel Kernel density estimation StateSpaceSet
SymbolicPermutation Ordinal patterns Vector, StateSpaceSet
SymbolicWeightedPermutation Ordinal patterns Vector, StateSpaceSet
SymbolicAmplitudeAwarePermutation Ordinal patterns Vector, StateSpaceSet
SpatialSymbolicPermutation Ordinal patterns in space Array
Dispersion Dispersion patterns Vector
SpatialDispersion Dispersion patterns in space Array
Diversity Cosine similarity Vector
WaveletOverlap Wavelet transform Vector
PowerSpectrum Fourier transform Vector

An intermediate representation to some probabilities estimators are the Encodings, that encode elements of input data into the positive integers. Encodings allow for a large amount of code reuse as well as more possible output measures from the same code.

These probabilities can be used to compute an arbitrary number of entropies already defined in the library. The entropies themselves support an interface for different entropy estimators. It turns out, defining an entropy is one thing, but to estimate it there may be several ways. There are also a bunch of entropy definitions: Shannon, Renyi, Tsallis, Kaniadakis, Curado, StretchedExponential.

The package also has a generic interface for computing differential entropies instead of discrete ones.

On top of all that there is one more path: to compute "complexity measures", quantities related to entropies but not entropies in the formal mathematical sense.

Content in numbers

If counting everything above, there are 158 different complexity measures available out of the box.

Interface design

Perhaps the biggest victory of this package, which has never been done by any other similar code base about computing entropy-related quantities, is its design.:

Closing remarks

There's more functionality in progress, like the multiscale API, which will give access to multiscale variants of all the discrete measures, some of which have been explored in the literature, and most of them not!

We sincerely believe this package will accelerate scientific research that uses complexity measures to classify or analyze timeseries, and we welcome feature requests and pull requests on the GitHub repo!

Datseris commented 1 year ago

@kahaaga in this post we should add the "total number of measures"

kahaaga commented 1 year ago

Sorry for the late reply. This already looks good! However, I'll post some comments on this tomorrow and provide a count of number of available methods before we publish the release announcement.

Datseris commented 1 year ago

@kahaaga I think I'll release DynamicalSystems.jl v3 on Sunday or Monday, and I think it makes conceptual sense to announce this package first before the v3, because I intend to link this to the v3 release. If you can you may post some simple comments here, otherwise we can update the release later on.

kahaaga commented 1 year ago

The above are discrete entropies. If these are not your cup of tea, the package also has a generic interface for computing differential entropies.

I think this statement can be reduced to "The package also has a generic interface for computing differential entropies".

Each probability (defined by a "ProbabilityEstimator")

This should be ProbabilitiesEstimator.

... a count of number of available methods before we publish the release announcement.

With a manual count, I get:

In summary:

This number would be even higher if counting multiscale variations of these measures. But what has happened to the multiscale API, @Datseris? I can't find it in the most recent documentation. Is this intentional, or has it just slipped out when restructuring? I though we agreed to keep the multiscale API as is, but internally transition to a separate package for coarse-graining/sliding-window stuff later.

kahaaga commented 1 year ago

Ah, I see that you added a comment in the source code about the multiscale stuff not being part of the public API yet. Then we must have discussed this at length. We should probably resolve this before finalizing the paper on the software.

kahaaga commented 1 year ago

One can add a new type of probabilities estimator by extending a couple of simple functions (see dev docs) they immediately gain access to a plethora of functions for the corresponding entropies to the missing dispersion patterns complexity measure.

I'm not sure what the message in this last sentence is. I think there's a few words missing. I think what we want to say is

kahaaga commented 1 year ago

At the end of the listing of the number of available measures, I think we also should also mention that there's more functionality in progress, like the multiscale API, which will give access to multiscale variants of all the discrete measures, some of which have been explored in the literature, and most of them not

Datseris commented 1 year ago

Hi @kahaaga !

78 (136) ways of estimating discrete ...

What does the number in the parenthesis means?


But what has happened to the multiscale API, @Datseris?

Yeah, just give me one and a half weeks! By that time I promise I will have initialized the "WindowedViewer.jl" package that offers functionality for viewing offer various kinds of views of some timeseries. I'm a bit overwhelmed right now with finishing DynamicalSystems.jl v3.0 and also preparing for giving a workshop for it at the MPI Evol. Biology. After I am done with that, (3rd of March) than I'll come back at the multiscale stuff.

For now, let's keep the numbers without the multiscale, it's okay. For the paper of course we will have the multiscale in!


I've added all your other comments into the post; as soon as you give the ok I'll post it on Discourse!

kahaaga commented 1 year ago

Yeah, just give me one and a half weeks! For now, let's keep the numbers without the multiscale, it's okay. For the paper of course we will have the multiscale in!

No worries! No need to rush this. I am also swamped until around the 10th of March, so I won't have any time to deal with this until then.

What does the number in the parenthesis means?

My bad! I don't know what happened to the formatting. It should be 78 (13 estimators * 6 entropy definitions) for the non-normalized discrete entropies, and 65 (13 estimators * 5 entropy definitions for which entropy_maximum is defined) for the normalized discrete entropies.

kahaaga commented 1 year ago

I've added all your other comments into the post; as soon as you give the ok I'll post it on Discourse!

When you've added the numbers in my previous comment, feel free to post. I also just created a Discourse user (the same username as I have here), so feel free to tag me there too!

Datseris commented 1 year ago

Thanks. Can I ask a favor BTW? can you please update your github profile with a picture and your affiliation? EDIT: just a picture, affiliation is there.

kahaaga commented 1 year ago

Thanks. Can I ask a favor BTW? can you please update your github profile with a picture and your affiliation? EDIT: just a picture, affiliation is there.

Sure! Just give me a few minutes.