WIP: Address remaining todos for V2

kahaaga commented 1 year ago

Various fixes for #190

Re-implemented OrdinalPatternEncoding based on the new encoding api. As a consequence, I also had to re-do the symbolic estimators. This has resulted in drastically more readable code, and reduces the number of lines of code probably by half.
Address some todos.
Fixed more tests related to wrong argument order after refactoring.

Datseris commented 1 year ago

@kahaaga the API of encode/decode says that the symbols are the positive integers. And symbol -1 is reserved for "erroring" cases. Can we do this here as well by adding 1 to the output of encode?

kahaaga commented 1 year ago

@kahaaga the API of encode/decode says that the symbols are the positive integers. And symbol -1 is reserved for "erroring" cases. Can we do this here as well by adding 1 to the output of encode?

@Datseris Hm. We haven't released anything officially yet. Is there any reason we have to restrict the encodings to the positive integers? Can't the allowed encoding space be nonnegative integers?

The Lehmer code source for the ordinal pattern encoding is by definition defined for nonnegative integers starting from 0. The extra addition is not very expensive, but it adds to the complexity of the encoding algorithm by a theoretical worst-case factor of ca 50% for dim = 2, 16% for dim = 3, and 8% for dim = 4 (and much less for higher dimensions). Not sure how that will translate into real use, but it feels unnecessary.

EDIT: Ah, no, I think this will be completely negligible compared to overall runtime, because a lot of the time for symbolic estimators is spent sorting to find permutations. I'm fine with adding the 1. But I am also fine with changing the API wording (unless there is good reasons not to)

Datseris commented 1 year ago

Well, the positive integers is the preferred indexing system of Julia, and are also the set of enumerators. When you enumerate elements you use the positive integers. So it feels like the natural choice. The linear indexing of an array is also the positive integers, and this is used directly in the binning encodings.

Lastly, this system is also what we have adopted in Attractors.jl: "labels" of unique things (attractors) are the positive integers. Special label is -1.

Datseris commented 1 year ago

The extra addition is not very expensive, but it adds to the complexity of the encoding algorithm by a theoretical worst-case factor of ca 50% for dim = 2, 16% for dim = 3, and 8% for dim = 4 (and much less for higher dimensions). Not sure how that will translate into real use, but it feels unnecessary.

I dont' fully get all this, but surely there has to be some flaw in the logic if the addition of an integer to another integer would make computing something 50% more expensive.

kahaaga commented 1 year ago

I dont' fully get all this, but surely there has to be some flaw in the logic if the addition of an integer to another integer would make computing something 50% more expensive.

Just in terms of the number of operations done in the actual encoding, not in actual computation time. The number of additions in the forward Lehmer code used here increases with the dimension, and for dimension 2, the number of additions performed is just one. But, again, it should not matter at all, given the other steps that also have to be done that are much slower (sorting etc).

kahaaga commented 1 year ago

Well, the positive integers is the preferred indexing system of Julia, and are also the set of enumerators. When you enumerate elements you use the positive integers. So it feels like the natural choice. The linear indexing of an array is also the positive integers, and this is used directly in the binning encodings. Lastly, this system is also what we have adopted in Attractors.jl: "labels" of unique things (attractors) are the positive integers. Special label is -1.

Ok, let's stick with positive integers then. I'll adjust the code accordingly.

JuliaDynamics / ComplexityMeasures.jl

WIP: Address remaining todos for V2 #195