Q estimated elements and zeros - MARSSkfss stopped

atsa-es / MARSS

Multivariate Autoregressive State-Space Modeling with R

https://atsa-es.github.io/MARSS/

GNU General Public License v3.0

51 stars 28 forks source link

Q estimated elements and zeros - MARSSkfss stopped #104

Closed Carherg closed 3 years ago

Carherg commented 4 years ago

Hello,

I am using MARSS to estimate state space models with the following form for the Q matrix:

diag(Q) <- c("q1", 0,0,0,0,"q2",0,0,0,0, "q3", 0,"q4",0, "q5",0)

That is to say, some elements in the diagonal are estimated while others are fixed to zero. I am able to estimate these models with MARSS 3.10.10. However, after updating to version 3.11.1 I keep getting this error:

Error: Stopped in MARSS() before fitting because MARSSkfss stopped. Something is wrong with the model structure that prevents Kalman filter running. Try using control$trace=-1 and fit=FALSE and then look at the model that MARSS is trying to fit.

It is the same state space models and the same data that work in version 3.10.10, so I do not have any clue about what is wrong with the model structure. Could it be some issue due to changes in version 3.11.1?

I'm not sure if this is the right place to ask this question, but I will appreciate any help. Thank you so much.

eeholmes commented 4 years ago

Hi,

Can you post example code? I am not able to replicate the problem. The examples I ran with 0s on diagonal of Q worked ok.

BTW note that your code does not put 0s on the diagonal. It will put "0" on the diagonal which MARSS will interpret as a parameter named "0". I don't think this is the problem but thought I'd point this out. You need the following to put 0s on the diagonal:

Q <- matrix(list(0),3,3)
diag(Q) <- list("q1", 0, "q2")

or in MARSS 3.11.1 there is now a helper function ldiag()

Q <- ldiag(list("q1", 0, "q2"))

Though still you need list() not c().

Carherg commented 4 years ago

Thank you so much for your answer.

I am using MARSS to nowcast GDP and other economic time series. Each observed economic series ( y(t) ) depend on a common unobserved factor and an idiosyncratic error ( x(t)). Both the common factor and the idiosyncratic errors follow an AR process. Therefore, the state space representation is:

y(t) = Zx(t)

x(t) = Bx(t-1) + w(t) w(t)~N(0,Q)

Therefore, in terms of MARSS, all matrices are zero, except for B,Z and Q. In case it was useful, the state space representation is the same that the one in this paper Appendix (page 18):

https://www.um.es/econometria/Maximo/articulos/Argentina.pdf

I attach a R code with an example:

https://github.com/Carherg/Error_example/blob/master/SS_MARSS311_310R.R

I kept it as simple as possible by using just two series (US quarterly GDP and US monthly employment) and assume the factor and the errors follow an AR(2). When I run the script with MARSS 3.10.10, there is no errors, the estimation converges and this is the result:

MARSS fit is
Estimation method: BFGS 
Estimation converged in 353 iterations. 
Log-likelihood: 4260.4 
AIC: -8498.8   AICc: -8498.572   

       Estimate
Z.z1   4.63e-01
Z.z2   2.48e-01
B.b1   3.90e-01
B.b2   1.68e-01
B.b6  -6.44e-01
B.b7   1.67e-02
B.b11 -3.78e-02
B.b12 -3.13e-02
Q.q1   1.19e-04
Q.q6   1.62e-05
Q.q11  2.43e-05
Initial states (x0) defined at t=0

However, if I run the script with MARSS 3.11 I get this error.

Error: Stopped in MARSS() before fitting because MARSSkfss stopped.  Something is wrong with 
 the model structure that prevents Kalman filter running.
 Try using control$trace=-1 and fit=FALSE and then look at the model that MARSS is trying to fit.

I also tried to define Q using ldiag ,as you suggested, but I get the same error

I think it could be some issue with matrix Q diagonal because if I set Q="diagonal and unequal" there is no error and the model is estimated in MARSS 3.11. This is the result:

MARSS fit is
Estimation method: BFGS 
Estimation converged in 397 iterations. 
Log-likelihood: 4308.237 
AIC: -8576.473   AICc: -8575.745   

             _Estimate
Z.z1         2.83e-01
Z.z2         6.61e-01
B.b1         4.51e-01
B.b2         1.61e-01
B.b6        -6.87e-03
B.b7         5.44e-02
B.b11       -9.78e-02
B.b12       -2.83e-03
Q.(X1,X1)    1.18e-08
Q.(X2,X2)    7.03e-04
Q.(X3,X3)    2.63e-06
Q.(X4,X4)    1.59e-05
Q.(X5,X5)    5.08e-06
Q.(X6,X6)    1.59e-11
Q.(X7,X7)    4.85e-09
Q.(X8,X8)    7.32e-10
Q.(X9,X9)    1.49e-07
Q.(X10,X10)  1.21e-06
Q.(X11,X11)  2.12e-05
Q.(X12,X12)  5.29e-02
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

I appreciate very much your help.

eeholmes commented 4 years ago

MARSSkfss() is used for error-checking and right now, MARSS doesn't let you fully avoid this even if fun.kf=MARSSkfas. The problem is that MARSSkfss() (the classic Kalman filter) involves an inversion of Vtt1, and that becomes unstable (high condition value) for your problem. MARSSkfas() (the Durban & Koopman algorithm) avoids those inversions and is more stable. MARSSkfas() is the default but MARSSkfss() is used for some error-checking steps.

In MARSS version 3.10.10, there was an error in MARSSkfss() regarding V0 that would affect your case, which is what those error-checks passes. Note it didn't affect the fitting, as that used MARSSkfas(). It is just an error-check() step that is now now not passing in version 3.11.1 due to the inversion that MARSSkfss() needs to do.

The solution is for me to clean up the error-check part to allow you to turn off MARSSkfss() calls completely. control$trace = -1 with fun.kf="MARSSkfas" should do this, but it is not. There is a final MARSSkfss() call that is still being called.

I will clean that up and post a patch to the GitHub site this weekend.

Regards,

Eli

eeholmes commented 4 years ago

I have published release 3.11.2 to GitHub that will now allow you to turn off all MARSSkfss() error checking calls. You will need to pass in trace = -1 like so:

kf_ss= MARSS(df_marss, model=model.gen,control= list(trace = -1, maxit = 300),method="BFGS")

This will turn off all error-checking, which turns off calls to MARSSkfss(), used for error-checking because it has verbose info if there are model fitting issues.

To install:

install.packages("devtools")
library(devtools)
install_github("nwfsc-timeseries/MARSS@*release")

I'll push this to CRAN next week once I have a chance to make sure I didn't introduce any errors. I just added some if statements so should be fine. I also may decide to avoid the MARSSkfss() calls by default when `fun.kf="MARSSkfas" since they are not necessary unless fitting runs into problems and you need to debug where the problem lies.

Regards and thanks for bringing this to my attention!

Eli

Carherg commented 4 years ago

Hello, Eli:

Thank you for your answers.

I installed the release version, modified the code as you proposed and it worked! I ran my models and they are properly estimated in the release version 3.11.2 . I understand that, since trace=-1, it is possible to recover the states conditioned on all the data xtT, but it is no possible to obtainxtt1 and xtt.

I apreciate very much your help. Let me also thank you for developing, documenting and updating MARSS. It is an outstanding package and I find it really useful.

Regards,

Carlos

eeholmes commented 4 years ago

To get xtt1, you can use MARSSkfas(kf_ss)$xtt1.

Note, version 3.11.1 (and 3.11.2), now has predict and forecast functions, which will give you xtt1 with the CIs (and PIs). If you are looking for xtt1 and xtt, I expect that you may be forecasting. Look at ?predict.marssMLE and ?plot.marssPredict for examples. The plotting function for predict objects mimics the forecast package plotting functions.

I have not added a chapter on forecasting the User Guide yet. That's next.

Regarding xtt, yes unfortunately that needs MARSSkfss(), which won't run for your model. Also keep in mind that version 3.10.10 had a bug that affects your model structure so its xtt estimates are wrong. I'll have to ponder how to change the code to get xtt (and Vtt) for your class of model. These are in the Kalman filter part of MARSSkfss(), which is running ok, it is the smoother step where the inversion of Vtt1 is running into trouble. It may be a simple case of adding a flag to the function for whether to run the smoother or not.

eeholmes commented 4 years ago

I updated release 3.11.2 on GitHub so that now your original code with trace=1 works and xtt and Vtt will be added. It was just a matter of adding a flag so that the smoother didn't run when it wasn't needed.

As before, to install:

install.packages("devtools")
library(devtools)
install_github("nwfsc-timeseries/MARSS@*release")

Here is an example of forecasting:

kf_ss= MARSS(df_marss, model=model.gen, method="BFGS")
fr <- forecast.marssMLE(kf_ss, h=10)
plot(fr, include=20)

include says how many points of the training data to plot.

Note predict() uses all the data but you can specify xtt or xtt1 also.

Carherg commented 4 years ago

Thank you so much. I ran muy codes and I was able to recovered both xtt1 and xtt.

Yes, you are right, I am focused on forecasting, so I will find predict.marssMLE andplot.marssMLE very useful . Therefore, I appreciate very much your updates, because I am able to use now the version 3.11.

There is another minor issue related to my models that, if you allow me to ask, I would like to comment to you. I estimate the models with centered data (by substracting its mean). As the User Guide points out, it is normal in DFA to use standarized data, by substracting its mean and dividing by its standard deviation, However, when I use standarized data, I get an error:

Error in KFS(kfas.model, simplify = FALSE) : 
  Non-finite values on covariance matrix P.

If I try fun.kf = "MARSSkfss", the error is:

MARSSkfas and MARSSkfss tried to compute log likelihood and encountered numerical problems

So, I understand that with standarized data, one of the variances became ill-conditioned as the User Guide explains. However, I find it counterintuitive: the model is estimated with centered data (also even with non centered data), but there are errors with standarized data. I would have expected just the opposite.

eeholmes commented 4 years ago

Resolved with version 3.11.2 released on GitHub

eeholmes commented 4 years ago

Regarding the new errors with your standardized errors. What's happening is that the B estimate is becoming very unstable. Which you cannot see since the error prevented the model from being printed. When you are doing B estimation, best to set tinitx=1 so the Kalman filter initial condition is constrained by the data at t=1. So in your new code, here,

https://github.com/Carherg/Error_example/blob/master/SS_MARSS311_310R.R

model.gen =list(Z=Z,A=A,R=R,B=B,U=U,Q=Q,x0=x0,V0=V0,tinitx=1)

Though on playing with your new code, I see that tinitx=1 does not solve the problem. I think what is happening is that when you rescaled the variance to 1, the numbers are now spanning -2 to +2 (more or less) and that is driving up the Q estimates and making the estimates unstable. Maybe try only scaling the mean?

Regards,

Eli

Carherg commented 4 years ago

Hello,

Thank you for your answer and your time.

You are right, both B and Q estimates become very large with standarized data. I tried tinitx=1 and kept getting the same error. However, I got it to work by setting tinitx=1and also fixing the common factor variance to 1 q1=1. In this case , my example converges and the parameters are estimated.

I extended the example with one more monthly series and it keeps working if tinitx=1and q1=1.However, with two more monthly series and some missing values in one of them, I got the error again: https://github.com/Carherg/Error_example/blob/master/SS_MARSS_matrixP.R

Unfortuntaley, my models are even larger (up to ten series) and the time series quality is poorer (shorter samples and more missing values) than in these examples with US data, so the error keeps appearing even with tinitx=1and q1=1. As you suggest, I will go on working with centered data.

Again, I appreciate you help and interest.

Regards,

Carlos

eeholmes commented 4 years ago

I was able to get the new code to run (w 4 time series) using de-meaned data and estimating q1. I think you need to estimate q1 since when you only remove mean, the q1 estimate is far from 1.

So these changes:

Q[1,1] <- "q1"
df_marss <- zscore(df_marss, mean.only=TRUE)

I have updated zscore() to add in the mean.only option.

zscore <- function(x, mean.only=FALSE) {
  ismat <- is.matrix(x) # else is vector
  if (ismat) {
    Sigma <- sqrt(apply(x, 1, var, na.rm = TRUE))
    x.bar <- apply(x, 1, mean, na.rm = TRUE)
  } else {
    Sigma <- sqrt(var(x, na.rm = TRUE))
    x.bar <- mean(x, na.rm = TRUE)
  }
  x.z <- (x - x.bar)
  if (!mean.only) x.z <- (x - x.bar) * (1 / Sigma)
  if (ismat) rownames(x.z) <- rownames(x)
  x.z
}

I am running off a new version 3.11.2 (master on GitHub) but I think the 3.11.2 release on GitHub will work the same. I made a change so that a optim fit won't stop with a KFS() error (which is happening after it successfully fit). Rather it will tell you that it fit, show you the fit, but warn your that KFS() wouldn't run. I'll post the update later today. I am working through version tests this week before submitting the update to CRAN.

Definitely keep sending me any examples that don't work. This is helping me improve the fitting and error-reporting for this class of model that tends to produce matrices with high condition values. I still have a few more bugs in the plotting functions to fix since they are not gracefully avoiding MARSSkfss() when it cannot be run.

Here is some code to run forecasts which can be good to see if things look reasonable:

kff <- forecast.marssMLE(kf_ss_2, h=10)
plot(kff, include=50)

blue dots are data; blue with grey ranges are forecasts.

Carherg commented 4 years ago

You are right, with de-meaned data and estimating q1 model 2 converges. Moreover, both models converge with raw data and estimatied q1. Again q1 values are far from 1. However, with raw data I think the model shoud also include matrix A.

On the other hand, once I reduced the sample from 2020 to 2019, model 1 (with 3 series and standarized data) failed to converge again. I think the change you are planning would be very useful: showing the fit but warning that matrices are unestable. It would help to identify the problem without running debug

Regarding forecast, the outcome of plot is totally great. I wonder if it would be useful an option to plot just one of the variables or one of the states. The forecast of model 2 seems to me quite reasonable. The variable of interest in this example is the US GDP (Y1) and its forecast for the next quarter. In the example, model 2 forescasts an annualized rate of growth of 17.33% in 2020Q3 with data up to July 2020. By comparison, as of July 31 the New York Fed nowcast was 16.85%, with a much more detailed model. https://www.newyorkfed.org/research/policy/nowcast.html

Nevertheless, while I was tinkering with forecast and fit and I got an error when I chose type = "yyt". For instance, in model 2, when I run:

kff_tt <- forecast.marssMLE(kf_ss_2, h=12, type="ytt") or

kff_fit_tt <- fitted(kf_ss_2, type="ytt")

I get the same error:

Error in Zt %*% hatxt[, t, drop = FALSE] : 
  requires numeric/complex matrix/vector arguments

Runningdebug(fitted) I see that hatxt is NULL, it seems to me that MARSSkfss in hatxt <- MARSSkfss(MLEobj)[["xtt"]] is not running, but I don't know why, perhaps matrix instability again.

eeholmes commented 4 years ago

Hi,

I posted another release (called 3.11.2-beta) to GitHub that has the fix so that when BFGS successfully fits but the Kalman filter won't run, it will still report the model to you so you can see what is going on. You can give it a try using this:

library(devtools)
install_github("nwfsc-timeseries/MARSS@*release")

I am in the final testing on this and hope to submit to CRAN next week. If you run into anything, let me know. My goal is for MARSS() to exit gracefully and give some helpful troubleshooting info if a model fit fails. As opposed to just a cryptic error message.

Note, I fixed the problem with no 'xtt' and 'Vtt' for your models (when they fit). Now these are added to your fit objects even when MARSSkfss() doesn't run.

Regards,

Eli

P.S. Sometimes your data have all NA at t=1. While probably unimportant, it will generally improve fitting if you don't have all NAs at the start of your data. I was trimming off that with df_marss[, -1].

Carherg commented 4 years ago

Hi,

Thank you so much for the new release. Now, I am de-meaning data with zscore(df_marss, mean.only = TRUE) and it works great. Also the issue with forecasting ytt is no longer present. I understand that it got solved once xtt is included in the kf element. Finally, as you said, the model is fitted even though Kalman filter didn't run and it shows the new warning. The explanation once I run MARSSinfo('optimerror54') is quite usefull. However, I notice that if I try to run forecast.marssMLEon an "wrong model" ( fitted model but no Kalman filter run) I get again the general error:

Error in KFS(kfas.model, simplify = FALSE) : Non-finite values on covariance matrix P

I wonder if it would be useful for other users to include in the MARSS function the same warning than in forecast.marssMLE

I have a doubt regarding versions. With release 11.3.2, I was able to run model 1 with standarized data,tinitx=1 and q1=1. However, with release 11.3.2.beta, model 1 parameters are estimated but the Kalman filter does not run and I get the new warning about successfully fitted model but MARSSkfas error. I tried to go back to release 11.3.2 to compare the results but I was not able to get it back from Github. Do you think is there a reason why model 1 stopped to run?

I am working now on forecasting, using forecast.marssMLE. I will try to improve the estimation by avoiding NA at the beginning of the data. If any new issue come out, I let you know.

eeholmes commented 4 years ago

To install release 3.11.2, use

devtools::install_github("nwfsc-timeseries/MARSS@v3.11.2")

You'll probably want to start with a fresh R session before you do that.

I'll need to futz a bit to figure out what changed in 3.11.2-beta to change the behavior.

Update: I ran with 3.11.2 and got the same behavior as with 3.11.2-beta. This is for with model 1 and Q[1,1]=1, tinitx=1, and data zscored. The model fit but B and Q were large and the Kalman filter would not run.