CUSUM and MOSUM (BFAST/BFASTesque) implementations

ceholden commented 8 years ago

Would be nice to implement CUSUM or MOSUM test statistics in YATSM seeing as we already have the recursive residual calculation implemented and validated against the strucchange package.

We could test and implement the CUSUM or MOSUM on top of recursive residuals or simple OLS residuals to further mirror the strucchange package. From there it shouldn't (?) be too hard to go all the way to a full BFAST / BFAST monitor implementation.

I figure we should at least get to the MOSUM calculation so we can benchmark computation performance in R versus Python. Depending on the results, maybe there'd be some more interest in implementing all the bells and whistles in BFAST.

Thoughts? @verbe039, @dutri001, @bendv, et al

ceholden commented 8 years ago

One assumption I'm making in this is that calculating recursive residuals and then the MOSUM calculation is the most expensive bit of running BFAST. Does anyone have a sense of if this is true from benchmarks?

janverbesselt commented 8 years ago

Thanks Chris, and Eliakim (@hamun001 on github) and Achim Zeileis (not on github - but will pas on this conversation) can comment here. As for bfastmonitor() we can start with just the MOSUM implementation a go from there (see also Eliakim's implementation of bfastmonitor() on the google Earth Engine, javascript). bfastmonitor() is more simple/efficient when compared to bfast() as there no iterative decomposition, and break dating per component done. bfastmonitor() fits as seasonal trend model at once and uses the monitor() function for the strucchange package. Once MOSUM works, we should certainly try to implement the CUSUM based tests.

janverbesselt commented 8 years ago

Feedback from Achim: " As for the costs: Everything based on standard OLS residuals is fairly cheap. Recursive or moving/rolling sums cost a little bit but I presume that there are already canned solutions for this in Python - like cumsum() in R. Recursive residuals and recursive or moving/rolling parameter estimates are more costly and take more time. And if you move to full estimation of breakpoints, this is of order O(n^2)."

ceholden commented 8 years ago

Thanks a lot Jan & Achim! I guess I'm mostly interested in implementing these in Python to forward the ensemble ideas we discussed yesterday and I've seen certain kinds of data work better with other approaches (L-HV timeseries, for example, work pretty well just using CUSUM).

Speed is a secondary concern and at the end of the day most computation is done via LAPACK or BLAS or whatever in R and Python, so maybe there won't be such a speed improvement. Achim's comments about the computation complexity certainly helps as I'm not so great at thinking on that. I'm very interested to mine as many approaches from real statisticians as possible so a connection to him through you is great!

This has to be a lower priority for me, but let's use this ticket to keep track of things. I have recresid implemented already with a test, but I'll switch the test data out to something remotely sensed instead of the airquality dataset. As for a plan,

[x] Convert recresid test to use NDVI timeseries or something
[ ] Implement Rec-MOSUM test and validate against strucchange OLS-MOSUM is easy after this is done.
[x] Implement Rec-CUSUM and OLS-CUSUM and validate against strucchange
[ ] Implement bfastmonitor historic period finding methods like ROC test
[ ] Implement remainder of bfastmonitor

ceholden / yatsm

CUSUM and MOSUM (BFAST/BFASTesque) implementations #62