Datseris commented 1 year ago

@kahaaga I need some help here: the test for the non-encoded spatial dispersion patterns fails, but I am not sure why. I fixed a problem in the code with the normalization in the previous PR. But I am not sure what I did wrong to make this fail.

kahaaga commented 1 year ago

need some help here: the test for the non-encoded spatial dispersion patterns fails, but I am not sure why. I fixed a problem in the code with the normalization in the previous PR. But I am not sure what I did wrong to make this fail.

I'll have a look!

Datseris commented 1 year ago

@kahaaga Can we please rename the abstract type ComplexityMeasure to ComplexityEstimator for increasing harmony in the library? I'll add a deprecation so nothing will be breaking.

kahaaga commented 1 year ago

@Datseris This was a bit tricky to track down, but I think I've found the issue.

The test that fails is the following:

using ComplexityMeasures
using Test
using Random: Xoshiro
y = rand(0:1, 1000, 1000);
est_y = SpatialDispersion(stencil, y, c = 2)
est_y_presymb = SpatialDispersion(stencil, y; skip_encoding =  true, L = 2)

julia> @test 0.99 <= entropy_normalized(est_y, y) <= 1.0
Test Passed

julia> @test 0.99 <= entropy_normalized(est_y_presymb, y) <= 1.0
Test Failed at REPL[57]:1
  Expression: 0.99 <= entropy_normalized(est_y_presymb, y) <= 1.0
   Evaluated: 0.99 <= 0.43067486081396994 <= 1.0
ERROR: There was an error during testing

So the error occurs when the input is pre-encoded (i.e. categorial data with a known number of categories). In that case, specifying L = 2 indicates that there are a total of 2 possible outcomes for each cell in the ND-array. This should be equivalent to discretizing some continuous data with c = 2 categories.

Currently, this does not happen because the implementation of outcome_space does not consider L at all. It just defaults to whatever the value of c is:

outcome_space(est::SpatialDispersion) = outcome_space(Dispersion(; c = est.c, m = est.m))

Therefore, instead of getting 2^m outcomes, we get c^m = 3^m outcomes when c = 3.

julia> outcome_space(est_y_presymb)
125-element Vector{SVector{3, Int64}}:
 [1, 1, 1]
 [2, 1, 1]
 [3, 1, 1]
 [4, 1, 1]
 [5, 1, 1]
 [1, 2, 1]
 [2, 2, 1]
 [3, 2, 1]
 [4, 2, 1]
 [5, 2, 1]
 [1, 3, 1]
 [2, 3, 1]
 ⋮
 [4, 3, 5]
 [5, 3, 5]
 [1, 4, 5]
 [2, 4, 5]
 [3, 4, 5]
 [4, 4, 5]
 [5, 4, 5]
 [1, 5, 5]
 [2, 5, 5]
 [3, 5, 5]
 [4, 5, 5]
 [5, 5, 5]

Since entropy_normalized calls total_outcomes, which defaults to length(total_outcomes(est)), the error above propagates to entropy_maximum, so the result for entropy_normalized also is wrong.

By defining

function outcome_space(est::SpatialDispersion)
    if est.skip_encoding
        return outcome_space(Dispersion(; c = est.L, m = est.m))
    else
        return outcome_space(Dispersion(; c = est.c, m = est.m))
    end
end

the user-specified L is taken into consideration and we get the correct outcome space

julia> outcome_space(est_y_presymb)
8-element Vector{SVector{3, Int64}}:
 [1, 1, 1]
 [2, 1, 1]
 [1, 2, 1]
 [2, 2, 1]
 [1, 1, 2]
 [2, 1, 2]
 [1, 2, 2]
 [2, 2, 2]

 julia> outcome_space(est_y)
8-element Vector{SVector{3, Int64}}:
 [1, 1, 1]
 [2, 1, 1]
 [1, 2, 1]
 [2, 2, 1]
 [1, 1, 2]
 [2, 1, 2]
 [1, 2, 2]
 [2, 2, 2]

and the normalized entropy for user-specified L is correct:

julia> using ComplexityMeasures, Test
julia> using Random: Xoshiro

julia> y = rand(0:1, 1000, 1000);

julia> est_y = SpatialDispersion(stencil, y, c = 2);

julia> est_y_presymb = SpatialDispersion(stencil, y; skip_encoding =  true, L = 2);

julia> @test 0.99 <= entropy_normalized(est_y, y) <= 1.0

Test Passed

julia> @test 0.99 <= entropy_normalized(est_y_presymb, y) <= 1.0
Test Passed

I'll just update this PR with a fix.

kahaaga commented 1 year ago

In addition to fixing the above, I also added an explicit test case that would have caught this error.

Datseris commented 1 year ago

Thanks a lot for fixing this. I made the name change and will merge once tests pass.

codecov[bot] commented 1 year ago

Codecov Report

Merging #244 (dbb709c) into main (458933b) will decrease coverage by 3.35%. The diff coverage is 91.66%.

@@            Coverage Diff             @@
##             main     #244      +/-   ##
==========================================
- Coverage   84.75%   81.39%   -3.36%     
==========================================
  Files          47       49       +2     
  Lines        1141     1220      +79     
==========================================
+ Hits          967      993      +26     
- Misses        174      227      +53

Impacted Files	Coverage Δ
src/complexity_measures/approximate_entropy.jl	`95.00% <ø> (ø)`
src/complexity_measures/missing_dispersion.jl	`100.00% <ø> (ø)`
.../complexity_measures/reverse_dispersion_entropy.jl	`100.00% <ø> (ø)`
src/complexity_measures/sample_entropy.jl	`93.54% <ø> (ø)`
src/encoding_implementations/gaussian_cdf.jl	`61.53% <ø> (ø)`
...rc/encoding_implementations/rectangular_binning.jl	`92.30% <ø> (+4.61%)`	:arrow_up:
src/entropies_definitions/kaniadakis.jl	`83.33% <ø> (ø)`
src/entropies_definitions/tsallis.jl	`85.71% <ø> (ø)`
src/multiscale.jl	`0.00% <ø> (-38.47%)`	:arrow_down:
src/multiscale/multiscale.jl	`0.00% <0.00%> (ø)`
... and 31 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

JuliaDynamics / ComplexityMeasures.jl

Fix some bugs #244

Codecov Report