CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
58 stars 13 forks source link

lcmm() in R Error: Numerical problem by computing fn value of function is : NaN #247

Closed bookworm1516 closed 1 month ago

bookworm1516 commented 8 months ago

My data frame filtered_gmmdf_long (long format) has no NA. It contains the variables ID, time (i.e. 0, 3, 8, 15, 25), and wblv (continuous, approximately Normal distributed).

summary(gmmdf_long90$wblv) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's -1.086 -0.118 0.034 0.000 0.146 0.341 5652 describe(gmmdf_long90$wblv) vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 10623 0 0.2 0.03 0.02 0.19 -1.09 0.34 1.43 -0.88 0.9 0 summary(filtered_gmmdf_long$wblv) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.08550 -0.10606 0.03237 0.00303 0.13457 0.34062 describe(filtered_gmmdf_long$wblv) vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 4380 0 0.19 0.03 0.02 0.17 -1.09 0.34 1.43 -0.9 1.29 0

Sample data

library(dplyr)
sample = as.data.frame(matrix(data = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,
                      0,3,8,15,25,0,3,8,15,25,0,3,8,15,25,0,3,8,15,25,
                             -0.0924870160,0.0274498752,0.2398900582,
                             0.1422529358,
                             0.1440320863,-0.0869990920,0.1423375351,
                            -0.239953000,-0.2378607918,0.1893095329,
                            0.0470979475,0.0681092844,0.0643890367,
                            -0.1299717236,0.0284062668,0.0249408593,
                            0.2341043359,0.0866213575,0.1003340839,
                            0.2721473565),ncol = 3))|>
  mutate(ID = V1, time = V2, wblv = V3)|>
  select(-starts_with("V"))

I am attempting to fit a simple growth model using lcmm() from package {lcmm} before conducting growth mixture models using the same function. I am able to fit models with hlme() fine and can fit models with lcmm() fine if link="5-quant-spline" but I am not able to fit models if link="linear"

# gmm1b<-hlme(wblv~time,subject="ID",var.time="time",random=~1+time,
#            ng=1,data=filtered_gmmdf_long)
###This works properly
# gmm2b<-gridsearch(rep=100,maxiter=10,minit=gmm1,
#                  hlme(wblv~time,subject="ID",random=~1+time,
#                       ng=2,data=filtered_gmmdf_long,mixture=~time))
###This also works properly
gmm2c<-gridsearch(rep=100,maxiter=10,minit=gmm1,
+                 hlme(wblv~time,subject="ID",random=~1+time,
+                     ng=2,nwg=T,data=filtered_gmmdf_long,mixture=~time))
###This also works properly
> lcmm1<-lcmm::lcmm(wblv~time,random=~time,subject="ID",
+            data=filtered_gmmdf_long,link='linear',ng=1)

resulting in the error

Numerical problem by computing fn value of function is : NaN

Originally I tried to do this with my entire dataset (n=3255) but I switched to complete responses only (n=873) after receiving this message. I also verified that all variables are identified as either numeric or categorical (with the exception of ID).

I really don't understand this error message or who to troubleshoot based off of it. Any help would be very appreciated! @CecileProust-Lima @VivianePhilipps

VivianePhilipps commented 8 months ago

Hello,

this can happen if the default initial values are not appropriate for your data. You can then specify other initial values in argument B. In your case, as the hlme model works, you can use these estimations as initial values for the lcmm model. But be careful, the parameterizations are different, so you have to rescale the estimations. In your example you can use :

lcmm(wblv ~ time, random = ~ time, subject = "ID", data = filtered_gmmdf_long, link = 'linear', ng = 1, 
B = c(gmm1b$best[2] / gmm1b$best[6], gmm1b$best[3:5] / (gmm1b$best[6]^2), gmm1b$best[1], gmm1b$best[6]))

Best,

Viviane

bookworm1516 commented 8 months ago

@VivianePhilipps,

Thank you! I have a follow-up question. Since opening this item, I experimented with other link functions and found that I did not run into issues when using link="5-quant-splines" but did when using "link"

> gmm1spline<-lcmm(wblv_z~time,subject="ID",random=~time,
+                    data=gmmdf_long90,link='5-quant-splines',ng=1)
###This works###
> lcmm1<-lcmm(wblv_z~time,subject="ID",random=~time,
+                    data=gmmdf_long90,link='linear',ng=1)
Numerical problem by computing fn value of function is : NaN
######
> lcmm1b<-lcmm(wblv_z~time,random=~time,subject="ID",data=gmmdf_long90,
+      link='linear',ng=1,B=c(gmm1b$best[2]/gmm1b$best[6],
+                             gmm1b$best[3:5]/(gmm1b$best[6]^2),
+                             gmm1b$best[1],gmm1b$best[6]))
Problem of computation. Verify your function specification...
Infinite value with finite parameters : b= -0.00700307 1.276333 -0.01768986 0.03810761 -0.00374745 -0.1194758

I want to understand what happens when fitting the model, but am still confused about this despite reading your companion paper (doi: 10.18637/jss.v078.i02). Could you elaborate on why default values are appropriate when fitting a model using splines for the link function but not when fitting a model with the linear link function? Or maybe refer me to another source that explains what happens when using splines vs linear links... Relatedly, I am uncertain what the difference between fitting a spline model with linear links and fitting any other model with spline links is; could you briefly explain this?

Again, thank you for your help. I really appreciate the work you have dedicated to creating this package.