Tchanders / InformationMeasures.jl

Entropy, mutual information and higher order measures from information theory, with various estimators and discretisation methods.
Other
66 stars 14 forks source link

Conditional entropy calculated incorrectly #35

Open zsteve opened 1 year ago

zsteve commented 1 year ago

Potential serious bug caused by marginalizing along incorrect axes.

Example: https://github.com/Tchanders/InformationMeasures.jl/blob/64810f28917dd015429951ac9722ef2b79559acb/src/Measures.jl#L152

Should be instead sum(probabilities_xy; dims = 1) since we want to integrate out x (corresp. to 1st dim of joint).

Also occurs e.g. https://github.com/Tchanders/InformationMeasures.jl/blob/64810f28917dd015429951ac9722ef2b79559acb/src/Measures.jl#L209

This is done correctly here: https://github.com/Tchanders/InformationMeasures.jl/blob/64810f28917dd015429951ac9722ef2b79559acb/src/Measures.jl#L270

I will submit a PR fixing this. Here is a MWE reproducing the issue:

using InformationMeasures
using Random
Random.seed!(0)
p_xyz = rand(3, 4, 5)
p_xyz /= sum(p_xyz)
@info sum(p_xyz)
p_yz = sum(p_xyz; dims = 1)[1, :, :]
p_xz = sum(p_xyz; dims = 2)[:, 1, :]
p_z = vec(sum(p_xyz; dims = (1, 2)))

# calculate I(X, Y | Z)
# using package
I_xy_z = get_conditional_mutual_information(p_xyz; probabilities = true, base = exp(1)) # = 0.1066909094805184
# manually using formula
using LogExpFunctions
H(p) = -sum(xlogx.(p))
H(p_xz) + H(p_yz) - H(p_xyz) - H(p_z) # 0.1066909094805184

# this seems wrong ...
# H(Y|Z) using package
get_conditional_entropy(p_yz; probabilities = true, base = exp(1)) # 1.558466614322862
# H(Y|Z) using formula
H(p_yz) - H(p_z) # 1.3294926714567774