This is a discussion post. Please feel free to comment and contribute to the discussion even if you are not directly involved in the development of inform or its wrapper libraries.
The Problem
The various information measures are really designed around discrete-valued timeseries data. In reality, most data are continuous in nature, and up to this point our go-to approach is to bin.
At this point we've implemented several binning procedures (see 1355d681ae5a16797783cef8c4af4b51d9c04887). Binning works fine for some problems (e.g. if the system has a natural threshold), but when it is applied artificially it can introduce hefty bias. The problem gets worse when you attempt to compare two different timeseries. Should they be binned in the same way, e.g. uniform bin sizes, specific number of bins, etc...?
Possible Solutions
All of the information measures are built around probability distributions. The timeseries measures simply construct empirical probability distributions and call an information measure on the distribution. "All" that must be done to accommodate continuously-valued distributions is to attempt to infer the distribution from the data.
Machine learning is more or less built around inferring probability distributions and then making some sort of decision from that. Consequently there are easily dozens of algorithms for inferring distributions from continuously-valued observations. One simple example of such an algorithm, kernel density estimation, has been around since the 50's.
Usefulness
This would likely be useful to @dglmoore and @colemathis as the systems that we deal with are either continously-valued, or are so discrete that treating as continuous is more memory efficient than otherwise. Would this be useful to anyone else? If so we can prioritize it over some of the other new features that we are considering.
Acknowledgments
The JIDT project, written and maintained by the estimable Joe Lizier, implements such an approach. The work produced a [paper]() which describes the three inference algorithms they've implemented.
Also, thank you @hbsmith and @colemathis for pointing out JIDT.
This is a discussion post. Please feel free to comment and contribute to the discussion even if you are not directly involved in the development of inform or its wrapper libraries.
The Problem
The various information measures are really designed around discrete-valued timeseries data. In reality, most data are continuous in nature, and up to this point our go-to approach is to bin.
At this point we've implemented several binning procedures (see 1355d681ae5a16797783cef8c4af4b51d9c04887). Binning works fine for some problems (e.g. if the system has a natural threshold), but when it is applied artificially it can introduce hefty bias. The problem gets worse when you attempt to compare two different timeseries. Should they be binned in the same way, e.g. uniform bin sizes, specific number of bins, etc...?
Possible Solutions
All of the information measures are built around probability distributions. The timeseries measures simply construct empirical probability distributions and call an information measure on the distribution. "All" that must be done to accommodate continuously-valued distributions is to attempt to infer the distribution from the data.
Machine learning is more or less built around inferring probability distributions and then making some sort of decision from that. Consequently there are easily dozens of algorithms for inferring distributions from continuously-valued observations. One simple example of such an algorithm, kernel density estimation, has been around since the 50's.
Usefulness
This would likely be useful to @dglmoore and @colemathis as the systems that we deal with are either continously-valued, or are so discrete that treating as continuous is more memory efficient than otherwise. Would this be useful to anyone else? If so we can prioritize it over some of the other new features that we are considering.
Acknowledgments
The JIDT project, written and maintained by the estimable Joe Lizier, implements such an approach. The work produced a [paper]() which describes the three inference algorithms they've implemented.
Also, thank you @hbsmith and @colemathis for pointing out JIDT.