JuliaStats / Roadmap.jl

A centralized location for planning the direction of JuliaStats
35 stars 3 forks source link

State space models #16

Closed mschauer closed 6 years ago

mschauer commented 9 years ago

[From https://github.com/JuliaStats/TimeModels.jl/issues/35]. By now there are too many implementations of Kalman filters and State space models

https://github.com/ElOceanografo/StateSpace.jl https://github.com/JuliaStats/TimeModels.jl https://github.com/wkearn/Kalman.jl https://github.com/setzler/JuliaEconomics/blob/master/Tutorials/tutorial_6_noglobals.jl https://github.com/QuantEcon/QuantEcon.jl/blob/master/src/kalman.jl and, well, https://gist.github.com/mschauer/3ffebdb8581f56341a9d

Maybe there is room for some coordination ;-)

andreasnoack commented 9 years ago

@mschauer Thanks for filing this issue and tracking down all the implementations. Hopefully, the efforts can be combined in a single, flexible, fast and well-tested version.

mschauer commented 9 years ago

The difficult part of course is to get the abstraction level right. Some of the design issues are

Granularity of abstractions, should there be types for

Is there interest in

tkelman commented 9 years ago

I'm biased because I'm a controls person, but I read this list and the "missing data" item is the only thing on it that fits all that well with JuliaStats in my mind. Linear state space models is the kind of thing the control systems toolbox in Matlab does really well, the rest of it sounds like either observer design or system identification.

Time series modeling frameworks are pretty boring if you don't have any elements in them for sensing, actuation, feedback and control.

mschauer commented 9 years ago

I think I do not quite understand you. If you propose to put what is sketched here and scattered at the different locations above under the roof of a different project than JuliaStats, which project would that be? I know https://github.com/JuliaDSP and https://github.com/JuliaControl/Control.jl but that looks like not yet opened to the public (cc @jcrist ?). I think that estimating the parameters of a partially observed autoregressive process along the lines of R packages dlm, dse or MARSS is interesting enough, but that is then of course my bias.

http://cran.r-project.org/web/packages/dlm/index.html http://cran.r-project.org/web/packages/dse/index.html http://cran.r-project.org/web/packages/MARSS/index.html

tkelman commented 9 years ago

System ID and parameter estimation are the kind of common routines that come up in various "Stats" applications, and with pretty much exactly the same underlying calculations, in DSP and Control applications as well. Party because the statisticians have been using R and the control people have been using Matlab for so many years, we look at each other's packages/toolboxes/modules/whatever and say, "you have no concept of control input here, this is useless" or "you don't account for missing data, this is useless." We don't have to solve every problem all at once here, but if you say http://en.wikipedia.org/wiki/State-space_representation and you don't have a B or a u anywhere, it instantly drives away an entire community.

jcrist commented 9 years ago

I'm biased the same way @tkelman is - when I hear state space I immediately think control stuff. https://github.com/JuliaControl/Control.jl is partially usable at this point, but I'm currently finishing up my thesis and have no time to work on it (should change in ~a month).

I don't think a stats package needs the same kind of support that a controls package does though. Do you need to merge models in series and parallel, analyze stability and robustness, and design controllers? Probably not (correct me if I'm wrong). I'm guessing that most of what you need is system id, and simulation. Having two full implementations of state space representations seems silly, but I see no problem with a partial, stats specific implementation.

mschauer commented 9 years ago

@tkelman Point taken, luckily I am halfway there https://gist.github.com/mschauer/3ffebdb8581f56341a9d#file-kalman-jl-L13

mschauer commented 9 years ago

@jcrist The motivation here is that there are already too many partial implementations.

ElOceanografo commented 9 years ago

@wkearn and I had talked a few months ago about trying to combine our efforts, but never got around to it. My (still unfinished) package aims for generality, solving the predict-observe-update-smooth problem for any time-evolving state variable. The basic verb functions as I see them are

I think model types are necessary...and it could be worth having separate state-evolution and observation model types, but I'm not sure. In my package, states at the moment are all continuous probability distributions. It might be nice to allow discrete states, or mixed discrete and continuous states, or to let users define their own state types if they want to, but I really like the clarity of having the state explicitly represented as a probability distribution.

Nonlinear dynamics and good parameter estimation are must-haves, I think. Missing data will be easier for the user as Nullables, but it would be nice also to have the capability to use time-dependent observation and/or state-evolution matrices, for instance to accomodate exogenous covariates or inputs.

mschauer commented 9 years ago

I revised my Kalman module. https://gist.github.com/mschauer/3ffebdb8581f56341a9d and added the fitting procedure from Shumway and Stoffer http://cran.r-project.org/web/packages/astsa/index.html . Now one could say that it is somewhat useful for something. I compared the output of the filter and smoother procedure with astsa and it is identical, also the EM up to numerical differences, so this makes a good starting point.

JonnyCBB commented 8 years ago

I'm far from a Kalman Filter expert and you all sound like you're doing great work. I saw this issue and thought I would give a +1 for accommodating missing data. I found this paper which discussed an algorithm for missing observations. I don't know if it's going to be of any use but I thought I would share it anyway.

rob-luke commented 8 years ago

I have taken a look at the different state space implementations and made the following table to summarise the difference and help me decide which to use. I have placed it here in the hope its useful to others. Due to the differences in notation and limited time I may have made mistakes, please let me know if I have and I will fix them. Happy to add more columns if other issues interest people.

Name Author Input (B) Feedthrough (D) Allow functions in model Missing Data Smoother Tests Module Bugs
TimeModels.jl JuliaStats N N Y N (Y) Y** Y Y
Kalman.jl @mschauer Y N N N Y Y* N N
StateSpace.jl @ElOceanografo N N N Y Y Y** Y N
Kalman.jl @wkearn Y N N Y N Y** Y
Control.jl @jcrist, JuliaControl Y Y N N? Y Y

* tests not available online ** tests code runs, not against data and values from other validated packages

mschauer commented 8 years ago

One thing is that most implementations are a bit overspecified (mine too, but not so much), For example implementations should with https://github.com/SimonDanisch/FixedSizeArrays.jl

mschauer commented 8 years ago

You add a column for implementations of the EM algorithm (at least one Y ;-) )

mschauer commented 6 years ago

I am moving towards publishing a package. See

https://github.com/mschauer/Kalman.jl

See https://github.com/mschauer/Kalman.jl/blob/master/example/Kalman%20filter%20with%20automatic%20differentiation.ipynb for an example showing the potential of the design for interaction with the Julia ecosystem, in this case ForwardDiff