Closed ciaranmci closed 1 year ago
Scipy is interpreting a
as a probability distribution whereas Pyinform is interpreting a
as a time series. The probability distribution corresponding to a
is [5/9, 4/9]
so if you feed that into Scipy you will get the same entropy.
To convert a time series into a probability distribution using Pyinform you use the Dist
class, which is a histogram of your time series observations. Then, you use the dump
method to convert your histogram into a probability distribution.
from pyinform.dist import Dist
a = [0,0,1,1,1,1,0,0,0] # time series
base_a = len(set(a))
a_dist = dist(base_a) # initialize dist over support of size 2
## Count observations in each bin
for obs in a:
a_dist.tick(obs)
## View probability distribution
a_dist.dump()
# [0.55556, 0.44444]
pyinform.block_entropy(a, k = 1)
# 0.9910760598382222
scipy.stats.entropy(a_dist.dump(), base = base_a )
# 0.9910760598382222
Pyinform is intentionally built around time series observations because it is designed for ease of use with empirical data, but behind the scenes time series are almost always converted to distributions using the Dist class.
Ah, got it. That is very clear and well explained. Thanks, @jakehanson
Just a small note about a typo for anyone who references this, in the future: the dist(...)
call should be Dist(...)
.
Specifically, this...
from pyinform.dist import Dist
# ...
a_dist = dist(base_a) # initialize dist over support of size 2
# ...
...should be...
from pyinform.dist import Dist
# ...
a_dist = Dist(base_a) # initialize Dist over support of size 2
# ...
From documentation:
Taking the example given in the documentation:
What have I got wrong?