JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
48 stars 11 forks source link

Announcement post draft #363

Closed Datseris closed 6 months ago

Datseris commented 6 months ago

ComplexityMeasures.jl v3 - a mathematically rigorous software for probability, entropy, and complexity

We (@kahaaga and @datseris) are very proud to announce v3 of ComplexityMeasures.jl. This v3 is the result of a year of very intensive thinking, redesigning, reimplementing, and going lots of back and forth, in order to make a software for estimating "complexity measures" (entropies and similar) from data. In typical Julia fashion we were greedy and hence we wanted the software to satisfy the following:

ComplexityMeasures.jl v3 satisfies these points and gives you even more. The best way to get an overview of the software is via its brand new over-arching tutorial.

What we want to highlight in this release is that we based the software on the mathematically rigorous formulation of estimating a complexity/information measure. This process proceeds as follows:

  1. given input data, decide how to extract probabilities from data. This means to "discretize" the data, which requires an "outcome space". OutcomeSpace are now a formal and extendable part of the library.
  2. Estimate the probabilities from the data according to the discretization. Biases can occur in this process, so one also needs to choose a ProbabilityEstimator instance (also extendable interface).
  3. Choose the information/complexity measure to estimate from the data. This requires a definition of a complexity measure (also extendable interface).
  4. Lastly, there may be bias in the estimation of the information/complexity (typical e.g., for Shannon entropy) so one also needs to decide the estimator for the information/complexity measure.

These steps are parallelized perfectly in the central function call of the library, to which all other calls end up as:

information(estimator, probability_estimator, outcome_space, input_data)

where estimator is the estimator for the information measure that also contains a reference to an information measure definition.

We hope this design is useful for the wider community, especially the statistics community!


@kahaaga let me know what you think.

kahaaga commented 6 months ago

ComplexityMeasures.jl v3 - a mathematically rigorous software for probability, entropy, and complexity

We (@kahaaga and @datseris) are very proud to announce v3 of ComplexityMeasures.jl. This v3 is the result of a year of very intensive thinking, redesigning, reimplementing, and going lots of back and forth, in order to make a software for estimating "complexity measures" (entropies and similar) from data. In typical Julia fashion we were greedy and hence we wanted the software to satisfy the following:

ComplexityMeasures.jl v3 satisfies these points and more. The best way to get an overview of the software is via its brand new over-arching tutorial.

What we want to highlight in this release is that we based the software on the mathematically rigorous formulation of estimating a complexity/information measure. For discrete estimation, the process proceeds as follows:

  1. Given input data, decide how to extract probabilities from data. This means to "discretize" the data, which requires an "outcome space". OutcomeSpace is now a formal and extendable part of the library.
  2. Estimate the probabilities from the data according to the discretization. Biases can occur in this process, so one also needs to choose a ProbabilityEstimator instance (also extendable interface).
  3. Choose the information/complexity measure to estimate from the data. This requires a definition of a complexity measure (also extendable interface).
  4. Lastly, there may be bias in the estimation of the information/complexity (typical e.g., for Shannon entropy) so one also needs to decide the estimator for the information/complexity measure.

These steps are parallelized perfectly in the central function call of the library, to which all other calls end up as:

information(info_estimator, probability_estimator, outcome_space, input_data)

where info_estimator is an DiscreteInfoEstimator(which also contains the information measure definition). Additionally, we provide an interface for differential estimation (using DifferentialInfoEstimator), which have widespread use in Shannon entropy estimation.

A bonus of this design is that we're not only able to reproduce most of the quantities that have been labelled "complexity measures" in the literature. By utilizing a specific combination of discretization technique, probability estimator, measure definition and estimator, we can readily compute all possible "complexity quantities" that are based on this approach. Most of these quantities have not been explored before!

We hope this design is useful for the wider community, especially the statistics community!

kahaaga commented 6 months ago

@Datseris, I tweaked a few bits, added a few lines, and added a paragraph with a bit of perspective at the end. The announcement looks good to me now. Anything else you'd like to tweak?