numerical overflow in calculating likelihood

ghost commented 7 years ago

sppb.msm <-msm(state_1 ~ days, subject = symbol, data = data,

qmatrix = qmat0, death = 3, method = "BFGS",
control = list(fnscale = 4000, maxit = 10000),
covariates = ~ PB) Error in Ccall.msm(params, do.what = "lik", ...) : numerical overflow in calculating likelihood

chjackson commented 7 years ago

Again, a reproducible example might help (PB isn't in the data you sent me). Have you tried all the tricks in the "convergence failure" section of the PDF manual? Perhaps try rescaling the covariates to have SD around 1, or rescaling the time variable in months instead of years?

The error message is not helpful so I'll leave the issue open, but this kind of error message is common and really difficult to prevent in a general case.

AlexKram commented 6 years ago

Have the same problem with different data sets. You can get a reproducible example even with provided cav data set:

> cavsex.msm <- msm(state ~ years, subject=PTNUM, data=cav, qmatrix=Q, deathexact=4, covariates = ~age)
Error in Ccall.msm(params, do.what = "lik", ...) : 
  numerical overflow in calculating likelihood

chjackson commented 6 years ago

Try setting control=list(fnscale=4000) in the call to msm. Since the -2 log likelihood is around 4000 in this example, this normalises the -2 log-likelihood during optimisation to a scale of around 1, so that overflow is less likely to occur.

chjackson commented 6 years ago

I have experimented with doing this rescaling automatically for all model fits, but it seemed to cause worse convergence for as many examples as it improved.

hbg26 commented 6 years ago

I'm facing the same problem when I use continuos covariates. The fnscale option doesn't seem to work with the cav dataset or with others datasets.

chjackson commented 6 years ago

Could you give a reproducible example please @hbg26

hbg26 commented 6 years ago

Using the cav dataset:

cav.age.msm <- msm( state ~ years, subject=PTNUM, data = cav, qmatrix = Q, deathexact = 4, covariates = ~age, obstype = 1, control=list(fnscale=4000))

I tried using different values for fnscale and maxit but didn't work.

chjackson commented 6 years ago

That works for me, using

Q<-rbind(c(0,0.25,0,0.25), c(0.166,0,0.166,0.166),c(0,0.25,0.25,0), c(0,0,0,0))

and the latest CRAN msm, R 3.4.4 on Linux. If it hadn't worked I would have tried rescaling the covariate:

age10 <- (cav$age - mean(cav$age))/10

28sophiesworld commented 5 years ago

The code to set the allowable transitions has proven essential for me. Try playing with those numbers, e.g. change the values within Q<-rbind

seairis commented 5 years ago

Great. The code of control=list(fnscale=4000) solved my problem of numerical overflow and control =list(reltol = 1e-16) solved false convergence.

Heart-wang commented 5 years ago

numerical overflow in calculating likelihood,,, How can i correct it？

Heart-wang commented 5 years ago

wsex.msm<-msm(state~years,subject=id,data=w,qmatrix=Q,covariates=~sex,method="BFGS",control=list(fnscale=4000,maxit = 10000)) Warning message: In msm(state ~ years, subject = id, data = w, qmatrix = Q, covariates = ~sex, : Optimisation has probably not converged to the maximum likelihood - Hessian is not positive definite.

chjackson commented 5 years ago

@Heart-wang A reproducible example would help others to give advice. Though please first try some of the tricks above in the thread, and in the "Convergence failure" section of the PDF manual

Heart-wang commented 4 years ago

When there are many states of the disease we are studying, such as the progress of the metabolic syndrome , eight states are set according to its diagnostic criteria. How should we set Q properly?

joseph-rickert commented 4 years ago

The text file contains some code that generates synthetic data to reproduce the msm error Error in Ccall.msm(params, do.what = "lik", ...) : numerical overflow in calculating likelihood repex_msm_0verflow.txt

chjackson commented 4 years ago

Thanks for the reproducible example. The problem there is that the model is not identifiable. The consequence is that the optimiser is exploring a log-likelihood surface that is a flat function of the parameters. Therefore massive differences in the parameters will produce tiny differences in the likelihood. To try to find the maximum likelihood, it makes massive jumps in the parameter space. At the second iteration it visits log intensity values of 4000 - which results in the intensity overflowing, and the error message.

The example looks like it's trying to fit a discrete-time Markov model to discrete-time data. I'm not sure msm is generally appropriate for this because it fits continuous-time models, but if you must, then it's usually more appropriate to use exacttimes=TRUE, which indicates that all transitions occur at the observation times. If this is omitted, then it assumes the data are snapshots of a continuous-time process - that doesn't work here because there are a practically infinitely many continuous-time models that are equally compatible with the data. Particularly if all transitions are permitted in real time, then people can move back and forth around the states any number of times between the discrete-time observations.

I could add something to the error message, like "Perhaps the model is not identifiable - check, for example, that the transition structure and observation type are appropriate for the data". But I'm not sure whether this is the only (or the dominant) reason that this message would be triggered - if there are other reasons, then the message may be confusing. A wider range of examples would help to clarify this.

Heart-wang commented 4 years ago

thanks

joseph-rickert commented 4 years ago

Hello Professor Jackson, Thank you for the quick reply. Yes, I can see that it is an identifiability problem. In fact, I have observed that constraining the transition matrix is helpful. It is not too hard to find a small, random, 3X3 birth-death process, transition matrix that msm() will fit for small synthetic data sets. For example, here is a TM and Q matrix pair that for 200 rows and 5 columns, msm() may produce results, or terminate with a warning that the algorithm may not have converged, but for which I have so far not observed the overflow error. Setting r to 2000 and c to 10 will generate the overflow error.

This behavior is curious. I did try to look through the msm code to see if there was a place to determine where things start to go South, but this is beyond me.

TM_1 <- matrix(c(0.5367896, 0.4632104, 0.0000000, 0.3617935, 0.2929464, 0.3452601, 0.0000000, 0.5953339, 0.4046661), nrow = 3, byrow = TRUE)

Q <- matrix(c(0,1,0,1,0,1,0,1,0),nrow=3,byrow=TRUE)

Also, it seems to be even easier to stumble onto small, random transition matrices with absorbing states that msm() can handle.

I did not set exacttimes=TRUE because I convinced myself that I was on to a simple way of simulating the Jump Chain of a continuous time process. I got this notion from a naive leap of faith after noticing that the methods used in the ctmcd package only need the raw transition counts or relative counts to drive them.

My current hypothesis as to why my random data sets are so different from the several real data sets you provide in the package, is that so far I have been working with overly simplistic transition times. However, I can find nothing in the theory of continuous chains to back this up.

Anyway, thank you for your work. msm is an amazing package and I think a real R treasure.

Best regards, Joe Rickert

Joseph B. Rickert RStudio R Community Ambassador R Consortium Director Cell: 408.489.0566

On Wed, Jun 3, 2020 at 3:17 AM Heart-wang notifications@github.com wrote:

thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/chjackson/msm/issues/5#issuecomment-638102119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN22VWQOB73UVSMIXP3F3TRUYPMZANCNFSM4DAPX77A .

alexocampo907 commented 4 years ago

I'm also running into the same issue. It is definitely an identifiability issue as Prof Jackson eloquently described above. I believe I can handle this in an ad hoc manner by collapsing states and reducing covariates. Also, the control=list(fnscale=#) has been very helpful. For my problem I used # = 200,000 base on my -2 * loglik scale.

I'm wondering 2 things: 1) Could the identifiability issue be solved via a Bayesian approach? My hunch is that sparsity is less of an issue since in that case we're simply nudging priors if the transitions are rare, rather than trying to find a true maximum likelihood in a sparse parameter space. If so, are there packages available in R for Bayesian Markov modeling (Would prefer not to go to WINBUGs)

2) Is this, in part, the motivation for the "misclassification" approach for hidden Markov models? By assuming misclassification, we can reduce the number of parameters, and therefore it is more likely this reduced set of parameters is identifiable. That is, the parameter space is shrunk and the maximum of the likelihood can be found.

chjackson commented 4 years ago

I expect Bayesian approaches would help if the data are weak, but only if you have substantive information to go into the prior. I'm not aware of any package that does continuous-time Markov models. Though Stan is better than WinBUGS for implementing general models.

edited April 2024: there is now the msmbayes package which implements some Bayesian continuous-time multistate models, but is not as general as msm.

Misclassification models have more parameters than non-hidden Markov models..

alexocampo907 commented 4 years ago

That makes sense. Also, (facepalm)... of course they have more parameters. Thanks!

HaniyehDanesh commented 1 year ago

Hi professor Jackson, I try to use msm package for purchasing data that can find behavior and my models is fitting for 7 states and I want to expand on 20 states, I stock in nstates=8 and I got a lot of error even when I increased number of individuals and number of iterations of person who purchasing. and get this error again and again. Also, Can you recommand me best way to find the initial values, I'm still struggling with that to find the best for each states.

nstates <- 8

Q8 <- matrix(c(rep(1/nstates, nstates^2)), byrow=TRUE, nrow=nstates)

hmodel8 <- list(hmmMV(hmmPois(0.04),hmmPois(0.02)), hmmMV(hmmPois(0.09),hmmPois(0.02)), hmmMV(hmmPois(0.01), hmmPois( 0.03)), hmmMV(hmmPois(0.05), hmmPois(0.02)), hmmMV(hmmPois(0.08), hmmPois(0.05)), hmmMV(hmmPois(0.04),hmmPois(0.01)), hmmMV(hmmPois(0.07),hmmPois(0.01)), hmmMV(hmmPois(0.59), hmmPois(0.04)))

model_poisson8 <- msm(cbind(floor(10Vegetables),floor(10Snacks)) ~ Duration,

subject = id, data = purchases.subset,

hmodel = hmodel8, qmatrix = Q8, control=list(fnscale=1000,maxit=10000))

Error in Ccall.msm(params, do.what = "lik", ...) : numerical overflow in calculating likelihood

chjackson commented 1 year ago

@HaniyehDanesh This model is almost certainly not identifiable from the data. You are allowing transitions between every state, assuming the state is intermittently observed, and assuming the states are misclassified. I guess you are confusing discrete and continuous time - see the course notes for an explanation of this issue.

Better to start with small, simple models - models that are small enough for you to understand how the data informs them, check the results make sense. Then build them up in stages. When you say your "model is fitting" for 7 states, have you checked that the parameter estimates are realistic? Often the optimiser will claim to converge, but the results are extremely large confidence intervals, hence the estimates are unreliable and the model is not useful.

HaniyehDanesh commented 1 year ago

Thank you for answering professor.

I still misunderstanding about "I guess you are confusing discrete and continuous time", we have purchasing time based on month for example 1,...,24 (this is our time). You mean I should use stateble.msm() or changing my transition matrix and initial values?

chjackson commented 1 year ago

@HaniyehDanesh As I said, please work through the course notes to understand all this better. Consider what transitions in continuous time can happen, and consider what your observations tell you about the state of the process at each point in continuous time.

chjackson / msm

numerical overflow in calculating likelihood #5