Closed Datseris closed 1 year ago
need some help here: the test for the non-encoded spatial dispersion patterns fails, but I am not sure why. I fixed a problem in the code with the normalization in the previous PR. But I am not sure what I did wrong to make this fail.
I'll have a look!
@kahaaga Can we please rename the abstract type ComplexityMeasure
to ComplexityEstimator
for increasing harmony in the library? I'll add a deprecation so nothing will be breaking.
@Datseris This was a bit tricky to track down, but I think I've found the issue.
The test that fails is the following:
using ComplexityMeasures
using Test
using Random: Xoshiro
y = rand(0:1, 1000, 1000);
est_y = SpatialDispersion(stencil, y, c = 2)
est_y_presymb = SpatialDispersion(stencil, y; skip_encoding = true, L = 2)
julia> @test 0.99 <= entropy_normalized(est_y, y) <= 1.0
Test Passed
julia> @test 0.99 <= entropy_normalized(est_y_presymb, y) <= 1.0
Test Failed at REPL[57]:1
Expression: 0.99 <= entropy_normalized(est_y_presymb, y) <= 1.0
Evaluated: 0.99 <= 0.43067486081396994 <= 1.0
ERROR: There was an error during testing
So the error occurs when the input is pre-encoded (i.e. categorial data with a known number of categories). In that case, specifying L = 2
indicates that there are a total of 2 possible outcomes for each cell in the ND-array. This should be equivalent to discretizing some continuous data with c = 2
categories.
Currently, this does not happen because the implementation of outcome_space
does not consider L
at all. It just defaults to whatever the value of c
is:
outcome_space(est::SpatialDispersion) = outcome_space(Dispersion(; c = est.c, m = est.m))
Therefore, instead of getting 2^m
outcomes, we get c^m = 3^m
outcomes when c = 3
.
julia> outcome_space(est_y_presymb)
125-element Vector{SVector{3, Int64}}:
[1, 1, 1]
[2, 1, 1]
[3, 1, 1]
[4, 1, 1]
[5, 1, 1]
[1, 2, 1]
[2, 2, 1]
[3, 2, 1]
[4, 2, 1]
[5, 2, 1]
[1, 3, 1]
[2, 3, 1]
⋮
[4, 3, 5]
[5, 3, 5]
[1, 4, 5]
[2, 4, 5]
[3, 4, 5]
[4, 4, 5]
[5, 4, 5]
[1, 5, 5]
[2, 5, 5]
[3, 5, 5]
[4, 5, 5]
[5, 5, 5]
Since entropy_normalized
calls total_outcomes
, which defaults to length(total_outcomes(est))
, the error above propagates to entropy_maximum
, so the result for entropy_normalized
also is wrong.
By defining
function outcome_space(est::SpatialDispersion)
if est.skip_encoding
return outcome_space(Dispersion(; c = est.L, m = est.m))
else
return outcome_space(Dispersion(; c = est.c, m = est.m))
end
end
the user-specified L
is taken into consideration and we get the correct outcome space
julia> outcome_space(est_y_presymb)
8-element Vector{SVector{3, Int64}}:
[1, 1, 1]
[2, 1, 1]
[1, 2, 1]
[2, 2, 1]
[1, 1, 2]
[2, 1, 2]
[1, 2, 2]
[2, 2, 2]
julia> outcome_space(est_y)
8-element Vector{SVector{3, Int64}}:
[1, 1, 1]
[2, 1, 1]
[1, 2, 1]
[2, 2, 1]
[1, 1, 2]
[2, 1, 2]
[1, 2, 2]
[2, 2, 2]
and the normalized entropy for user-specified L
is correct:
julia> using ComplexityMeasures, Test
julia> using Random: Xoshiro
julia> y = rand(0:1, 1000, 1000);
julia> est_y = SpatialDispersion(stencil, y, c = 2);
julia> est_y_presymb = SpatialDispersion(stencil, y; skip_encoding = true, L = 2);
julia> @test 0.99 <= entropy_normalized(est_y, y) <= 1.0
Test Passed
julia> @test 0.99 <= entropy_normalized(est_y_presymb, y) <= 1.0
Test Passed
I'll just update this PR with a fix.
In addition to fixing the above, I also added an explicit test case that would have caught this error.
Thanks a lot for fixing this. I made the name change and will merge once tests pass.
Merging #244 (dbb709c) into main (458933b) will decrease coverage by
3.35%
. The diff coverage is91.66%
.
@@ Coverage Diff @@
## main #244 +/- ##
==========================================
- Coverage 84.75% 81.39% -3.36%
==========================================
Files 47 49 +2
Lines 1141 1220 +79
==========================================
+ Hits 967 993 +26
- Misses 174 227 +53
Impacted Files | Coverage Δ | |
---|---|---|
src/complexity_measures/approximate_entropy.jl | 95.00% <ø> (ø) |
|
src/complexity_measures/missing_dispersion.jl | 100.00% <ø> (ø) |
|
.../complexity_measures/reverse_dispersion_entropy.jl | 100.00% <ø> (ø) |
|
src/complexity_measures/sample_entropy.jl | 93.54% <ø> (ø) |
|
src/encoding_implementations/gaussian_cdf.jl | 61.53% <ø> (ø) |
|
...rc/encoding_implementations/rectangular_binning.jl | 92.30% <ø> (+4.61%) |
:arrow_up: |
src/entropies_definitions/kaniadakis.jl | 83.33% <ø> (ø) |
|
src/entropies_definitions/tsallis.jl | 85.71% <ø> (ø) |
|
src/multiscale.jl | 0.00% <ø> (-38.47%) |
:arrow_down: |
src/multiscale/multiscale.jl | 0.00% <0.00%> (ø) |
|
... and 31 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@kahaaga I need some help here: the test for the non-encoded spatial dispersion patterns fails, but I am not sure why. I fixed a problem in the code with the normalization in the previous PR. But I am not sure what I did wrong to make this fail.