JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
53 stars 12 forks source link

Fix some bugs #244

Closed Datseris closed 1 year ago

Datseris commented 1 year ago

@kahaaga I need some help here: the test for the non-encoded spatial dispersion patterns fails, but I am not sure why. I fixed a problem in the code with the normalization in the previous PR. But I am not sure what I did wrong to make this fail.

kahaaga commented 1 year ago

need some help here: the test for the non-encoded spatial dispersion patterns fails, but I am not sure why. I fixed a problem in the code with the normalization in the previous PR. But I am not sure what I did wrong to make this fail.

I'll have a look!

Datseris commented 1 year ago

@kahaaga Can we please rename the abstract type ComplexityMeasure to ComplexityEstimator for increasing harmony in the library? I'll add a deprecation so nothing will be breaking.

kahaaga commented 1 year ago

@Datseris This was a bit tricky to track down, but I think I've found the issue.

The test that fails is the following:

using ComplexityMeasures
using Test
using Random: Xoshiro
y = rand(0:1, 1000, 1000);
est_y = SpatialDispersion(stencil, y, c = 2)
est_y_presymb = SpatialDispersion(stencil, y; skip_encoding =  true, L = 2)

julia> @test 0.99 <= entropy_normalized(est_y, y) <= 1.0
Test Passed

julia> @test 0.99 <= entropy_normalized(est_y_presymb, y) <= 1.0
Test Failed at REPL[57]:1
  Expression: 0.99 <= entropy_normalized(est_y_presymb, y) <= 1.0
   Evaluated: 0.99 <= 0.43067486081396994 <= 1.0
ERROR: There was an error during testing

So the error occurs when the input is pre-encoded (i.e. categorial data with a known number of categories). In that case, specifying L = 2 indicates that there are a total of 2 possible outcomes for each cell in the ND-array. This should be equivalent to discretizing some continuous data with c = 2 categories.

Currently, this does not happen because the implementation of outcome_space does not consider L at all. It just defaults to whatever the value of c is:

outcome_space(est::SpatialDispersion) = outcome_space(Dispersion(; c = est.c, m = est.m))

Therefore, instead of getting 2^m outcomes, we get c^m = 3^m outcomes when c = 3.

julia> outcome_space(est_y_presymb)
125-element Vector{SVector{3, Int64}}:
 [1, 1, 1]
 [2, 1, 1]
 [3, 1, 1]
 [4, 1, 1]
 [5, 1, 1]
 [1, 2, 1]
 [2, 2, 1]
 [3, 2, 1]
 [4, 2, 1]
 [5, 2, 1]
 [1, 3, 1]
 [2, 3, 1]
 ⋮
 [4, 3, 5]
 [5, 3, 5]
 [1, 4, 5]
 [2, 4, 5]
 [3, 4, 5]
 [4, 4, 5]
 [5, 4, 5]
 [1, 5, 5]
 [2, 5, 5]
 [3, 5, 5]
 [4, 5, 5]
 [5, 5, 5]

Since entropy_normalized calls total_outcomes, which defaults to length(total_outcomes(est)), the error above propagates to entropy_maximum, so the result for entropy_normalized also is wrong.

By defining

function outcome_space(est::SpatialDispersion)
    if est.skip_encoding
        return outcome_space(Dispersion(; c = est.L, m = est.m))
    else
        return outcome_space(Dispersion(; c = est.c, m = est.m))
    end
end

the user-specified L is taken into consideration and we get the correct outcome space

julia> outcome_space(est_y_presymb)
8-element Vector{SVector{3, Int64}}:
 [1, 1, 1]
 [2, 1, 1]
 [1, 2, 1]
 [2, 2, 1]
 [1, 1, 2]
 [2, 1, 2]
 [1, 2, 2]
 [2, 2, 2]

 julia> outcome_space(est_y)
8-element Vector{SVector{3, Int64}}:
 [1, 1, 1]
 [2, 1, 1]
 [1, 2, 1]
 [2, 2, 1]
 [1, 1, 2]
 [2, 1, 2]
 [1, 2, 2]
 [2, 2, 2]

and the normalized entropy for user-specified L is correct:

julia> using ComplexityMeasures, Test
julia> using Random: Xoshiro

julia> y = rand(0:1, 1000, 1000);

julia> est_y = SpatialDispersion(stencil, y, c = 2);

julia> est_y_presymb = SpatialDispersion(stencil, y; skip_encoding =  true, L = 2);

julia> @test 0.99 <= entropy_normalized(est_y, y) <= 1.0

Test Passed

julia> @test 0.99 <= entropy_normalized(est_y_presymb, y) <= 1.0
Test Passed

I'll just update this PR with a fix.

kahaaga commented 1 year ago

In addition to fixing the above, I also added an explicit test case that would have caught this error.

Datseris commented 1 year ago

Thanks a lot for fixing this. I made the name change and will merge once tests pass.

codecov[bot] commented 1 year ago

Codecov Report

Merging #244 (dbb709c) into main (458933b) will decrease coverage by 3.35%. The diff coverage is 91.66%.

@@            Coverage Diff             @@
##             main     #244      +/-   ##
==========================================
- Coverage   84.75%   81.39%   -3.36%     
==========================================
  Files          47       49       +2     
  Lines        1141     1220      +79     
==========================================
+ Hits          967      993      +26     
- Misses        174      227      +53     
Impacted Files Coverage Δ
src/complexity_measures/approximate_entropy.jl 95.00% <ø> (ø)
src/complexity_measures/missing_dispersion.jl 100.00% <ø> (ø)
.../complexity_measures/reverse_dispersion_entropy.jl 100.00% <ø> (ø)
src/complexity_measures/sample_entropy.jl 93.54% <ø> (ø)
src/encoding_implementations/gaussian_cdf.jl 61.53% <ø> (ø)
...rc/encoding_implementations/rectangular_binning.jl 92.30% <ø> (+4.61%) :arrow_up:
src/entropies_definitions/kaniadakis.jl 83.33% <ø> (ø)
src/entropies_definitions/tsallis.jl 85.71% <ø> (ø)
src/multiscale.jl 0.00% <ø> (-38.47%) :arrow_down:
src/multiscale/multiscale.jl 0.00% <0.00%> (ø)
... and 31 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more