Closed jungla88 closed 1 year ago
Hi Jungla,
Distributions are built from time series using the Dist class.
If you have a time series with continuous values, you will have to bin it since mutual info requires a discrete state space. You can bin the time series in several different ways: https://elife-asu.github.io/PyInform/utils.html?highlight=binning#module-pyinform.utils.binning.
My recommendation would be to bin the time series into a fixed number of states first, then use coalesce_series
to get rid of negative values. You will also want to check how sensitive your final result is to the number of bins you choose.
Hi jake,
thank you so much your reply.
Actually, I followed exactly your idea about binning the series. My only concern is about coalesce_series
: I am not totally sure but I think that after binning all negative values should be resolved according to the binning process since this map continuos values into positive integer discrete state space. Please correct me if I am wrong. Furthermore, should I apply Dist
to the output of the binning strategy before feeding to any method that compute information based metric, e.g. mutual_info
.
More clearly:
Given a raw timeseries ts
of real values, which is the correct procedure to apply to pyinform.mutualinfo.mutual_info
?
1) binning
(and/or coalesce_series
) -> dist
-> mutual_info
2) binning
(and/or coalesce_series
) -> mutual_info
Hi Jungla,
Method 2 is correct, as the distribution is built from the time series automatically.
Also, it looks like you are correct about not needing to run coalesce_series
in addition to binning.
This means the correct process is just:
binning
-> mutual_info
Hi,
I have not clear if a sequence in input to a method that require probability distribution automatically estimate the empirical distributions of the input data. For example Mutual Information requires 2
np.array
but I could not find where the empirical distribution is estimated. I also investigated C backend but again I have not found anything useful. Could you provide some information about this process? I am asking about this because I am experiencingInformError: an inform error occurred - "negative state in timeseries"
. I read a previous issue for such error and the answer was to usecoalesce_series
but I am not sure if it is correctly to apply it to continuos timeseries.