ContextLab / hypertools

A Python toolbox for gaining geometric insights into high-dimensional data
http://hypertools.readthedocs.io/en/latest/
MIT License
1.83k stars 160 forks source link

Use Kalman filter to fill in missing data #169

Open jeremymanning opened 6 years ago

jeremymanning commented 6 years ago

We currently use PPCA to infer nans. This works when, for the affected timepoints (or observations), at least some of the features are observed. However, when none of the features at a given timepoint are observed, PPCA can't fill in those missing features.

To deal with the scenario where all features are nans, we could use a Kalman filter (smooth + predict) to fill in the missing data using surrounding data. We could also use the Kalman filter to predict future observations, which would allow hypertools to function as a nice wrapper for a multi-dimensional Kalman filter.

The setup would be something like:

Another thought: we shouldn't apply Kalman filters by default-- if the user passes in non-timeseries data, it wouldn't make sense to use this approach. But we could provide access to predict via a keyword argument (to plot, reduce, and align).

This implementation looks nice: https://pykalman.github.io/