ELIFE-ASU / PyInform

A Python Wrapper for the Inform Information Analysis Library
https://elife-asu.github.io/PyInform
MIT License
45 stars 9 forks source link

`block_entropy()` does not match `scipy.stats.entropy()` when k = 1, as expected #41

Closed ciaranmci closed 1 year ago

ciaranmci commented 1 year ago

From documentation:

Taking the example given in the documentation:

a = [0,0,1,1,1,1,0,0,0]
base_a = len(set(a))
pyinform.block_entropy(a, k = 1)
# 0.9910760598382222
scipy.stats.entropy(a, base = base_a ) 
# 2.0

What have I got wrong?

jakehanson commented 1 year ago

Scipy is interpreting a as a probability distribution whereas Pyinform is interpreting a as a time series. The probability distribution corresponding to a is [5/9, 4/9] so if you feed that into Scipy you will get the same entropy.

To convert a time series into a probability distribution using Pyinform you use the Dist class, which is a histogram of your time series observations. Then, you use the dump method to convert your histogram into a probability distribution.

from pyinform.dist import Dist

a = [0,0,1,1,1,1,0,0,0]  # time series 
base_a = len(set(a))

a_dist = dist(base_a) # initialize dist over support of size 2

## Count observations in each bin
for obs in a:
    a_dist.tick(obs)

## View probability distribution
a_dist.dump()
# [0.55556, 0.44444] 

pyinform.block_entropy(a, k = 1)
# 0.9910760598382222
scipy.stats.entropy(a_dist.dump(), base = base_a ) 
# 0.9910760598382222

Pyinform is intentionally built around time series observations because it is designed for ease of use with empirical data, but behind the scenes time series are almost always converted to distributions using the Dist class.

ciaranmci commented 1 year ago

Ah, got it. That is very clear and well explained. Thanks, @jakehanson

ciaranmci commented 1 year ago

Just a small note about a typo for anyone who references this, in the future: the dist(...) call should be Dist(...).

Specifically, this...

from pyinform.dist import Dist
# ...
a_dist = dist(base_a) # initialize dist over support of size 2
# ...

...should be...

from pyinform.dist import Dist
# ...
a_dist = Dist(base_a) # initialize Dist over support of size 2
# ...