JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
53 stars 12 forks source link

Lempel–Ziv complexity measure #258

Closed Datseris closed 1 year ago

Datseris commented 1 year ago

New complexity measure:

https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv_complexity

The wiki website also has an implementation. This is really easy to implement as far as I can tell from the algorithm section at the end.

kahaaga commented 1 year ago

If we're including the LZ-complexity here, perhaps we should also do the same for the compression complexity (part of open PR in CausalityTools.jl)?

If so, I can open a PR that moves basic compression complexity part here.

Datseris commented 1 year ago

here's the distinction I propose:

Waht do you think?

kahaaga commented 1 year ago

Since the LZ-complexity operates on binary sequences, we should also consider integration with the probabilities estimators. Any estimator that internally encodes the input to integers can be used to convert a raw (multivariate) timeseries into a binary sequence. We just have to restrict the number of encodes symbols to 2

Datseris commented 1 year ago

We just have to restrict the number of encodes symbols to 2

I am not sure what this means. No encoding that we have at the moment can do this. THey all have more than 2 symbols (integers)

kahaaga commented 1 year ago

I am not sure what this means. No encoding that we have at the moment can do this. THey all have more than 2 symbols (integers)

Some of the estimators, through keyword arguments, can result in an outcome space with cardinality 2. Any sequence of such outcomes can be interpreted as a binary sequence. For example, one could use SymbolicPermutation(m=2) to convert a real-valued time series into a binary sequence. This is of course not the case for all estimators. Perhaps it's best to let the user encode themselves.

kahaaga commented 1 year ago

here's the distinction I propose:

  • if a measure has one input data argument, it is here
  • if it has two or more, then it can be interpreted as a relational measure and goes into causalitytools.

Waht do you think?

Yes, that makes sense.

The compression complexity causality algorithm uses two concepts:

  1. The effort-to-compress compression complexity, which in the PR I've implemented as compression_complexity(x, EffortToCompress()), where x can be both univariate and multivariate. With the ComplexityMeasures.jl 2.X API, this becomes complexity(EffortToCompress(), x), which is parallel to e.g. complexity(LempelZiv(), x).

  2. The joint effort-to-compress compression complexity. This has two inputs, i.e. complexity(EffortToCompress(), x, y), and can, as you say, therefore be considered as some sort of association measure.

I think the former belongs here, and the latter belongs in CausalityTools. Agree, @Datseris?

Datseris commented 1 year ago

Perhaps it's best to let the user encode themselves.

I think this is the best. You can add a note to the docstring about this in the docstring.


I think the former belongs here, and the latter belongs in CausalityTools. Agree, @Datseris?

Yeap, CausalityTools.jl can simply extent the method.