cjvanlissa / tidySEM

55 stars 7 forks source link

How do I reimplement the lca model to validation dataset? #90

Closed shinshinbooboo closed 6 months ago

shinshinbooboo commented 6 months ago

Hi,

I runned the following code;

res <- mx_lca(data = data, classes = 1:6)

res_final <- res[[3]]

class_predicted = class_prob(res_final) class_predicted = class_predicted$individual[,"predicted"]

I got the predicted class.

But, predict(res_final, newdata = test_data)

This code didn't work.

Could you tell me to reimplement the lca model to other dataset?

Gootjes commented 6 months ago

Can confirm.

Does executing this line before the predict() fix the issue? attributes(res_final)$tidySEM <- head(attributes(res_final)$tidySEM, 1)

Reproducible example

library(tidySEM)
#> Loading required package: OpenMx
#> Registered S3 method overwritten by 'tidySEM':
#>   method          from  
#>   predict.MxModel OpenMx

data(data_mix_ordinal, package = "tidySEM")

df <- data_mix_ordinal

df[1:4] <- lapply(df, ordered)

data <- df[1:4000,]
test_data <- df[4001:5000,]

mx_lca(data = data,
       classes = 1:2) -> res
#> Running mix1 with 8 parameters
#> 
#> Beginning initial fit attempt
#> Running mix1 with 8 parameters
#> 
#>  Lowest minimum so far:  30980.7563173574
#> 
#> Beginning fit attempt 1 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#> Beginning fit attempt 2 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#> Beginning fit attempt 3 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#> Beginning fit attempt 4 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#>  Lowest minimum so far:  30980.7563173571
#> 
#> Beginning fit attempt 5 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#> Beginning fit attempt 6 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#> Beginning fit attempt 7 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#> Beginning fit attempt 8 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#> Beginning fit attempt 9 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#> Beginning fit attempt 10 of at maximum 10 extra tries
#> Running mix1 with 8 parameters
#> 
#> Retry limit reached
#> 
#> Solution found
#> Final run, for Hessian and/or standard errors and/or confidence intervals
#> Running mix1 with 8 parameters

    #> 
    #>  Solution found!  Final fit=30980.756 (started at 30980.756)  (11 attempt(s): 11 valid, 0 errors)
    #>  Start values from best fit:
    #> -0.0300841692374917,0.327695281357551,-2.30401516829557e-08,0.287146622875563,-0.0476440008896391,0.28871498159232,-0.038862463533985,0.287036143451139
    #> Running mix2 with 17 parameters
    #> 
    #> Beginning initial fit attempt
    #> Running mix2 with 17 parameters
    #> 
    #>  Lowest minimum so far:  30980.7563173574
    #> 
    #> Beginning fit attempt 1 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #>  Lowest minimum so far:  30891.2626460191
    #> 
    #> Beginning fit attempt 2 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #> Beginning fit attempt 3 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #> Beginning fit attempt 4 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #> Beginning fit attempt 5 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #> Beginning fit attempt 6 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #> Beginning fit attempt 7 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #> Beginning fit attempt 8 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #>  Lowest minimum so far:  30891.2626460079
    #> 
    #> Beginning fit attempt 9 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #> Beginning fit attempt 10 of at maximum 10 extra tries
    #> Running mix2 with 17 parameters
    #> 
    #> Retry limit reached
    #> 
    #> Solution found
    #> Final run, for Hessian and/or standard errors and/or confidence intervals
    #> Running mix2 with 17 parameters

    #> 
    #>  Solution found!  Final fit=30891.263 (started at 30980.756)  (11 attempt(s): 11 valid, 0 errors)
    #>  Start values from best fit:
    #> 0.690368347858144,-0.365750208425671,0.384348787569792,-0.0819361070599222,0.302057852441105,0.304046992404753,0.331199810967552,0.148040367099041,0.301777803038847,0.460319911580775,0.297529583395115,0.118832574691775,0.267836821991063,-0.581510358246075,0.300602197667226,-0.313865724309656,0.285091852075128

    res_final <- res[[2]]

    predict(res_final)
    #> Error in switch(attr(object, "tidySEM"), mixture = {: EXPR must be a length 1 vector

    predict(res_final, newdata = test_data)
    #> Error in switch(attr(object, "tidySEM"), mixture = {: EXPR must be a length 1 vector

Created on 2024-03-14 with reprex v2.0.2

cjvanlissa commented 6 months ago

@Gootjes I see the problem with predict.mxmodel(). Does the fix I just pushed help?

Gootjes commented 6 months ago

@cjvanlissa For me it does solve my issue.

Let's hope it works too for @shinshinbooboo !

For reference

You can run this code in R before doing predict(), it solves the issue

predict.MxModel <- function(object,
                            newdata,
                            ...){
    cl <- match.call()
    cl[[1L]] <- str2lang("OpenMx:::predict.MxModel")
    if(!is.null(attr(object, "tidySEM"))){
        if("mixture" %in% attr(object, "tidySEM")){
            cl[[1L]] <- str2lang("tidySEM:::predict_mxmodel_mixture")
        }
    }
    eval.parent(cl)
}

predict_mxmodel_mixture <- function(object, newdata = NULL, ...){
    if(is.null(newdata)){
        return(class_prob(object, "individual")[[1]][,"predicted"])
    } else {
        mod_fix <- object
        for(c in names(object@submodels)){
            for(m in names(object[[c]]@matrices)){
                mod_fix[[c]][[m]]$free[,] <- FALSE
            }
        }
        mod_fix$data$observed <- rbind(newdata, mod_fix$data$observed)
        mod_fix@matrices$weights$free[,] <- FALSE
        mod_fix <- mxRun(mod_fix)
        return(class_prob(mod_fix, "individual")[[1]][1:nrow(newdata), , drop = FALSE])
    }
}
cjvanlissa commented 6 months ago

Your fix will work too. I don't know what I was thinking implementing that function..

shinshinbooboo commented 6 months ago

Thank you for your help!!

I tried this code.

predict(res_final) worked well.

But,I got an error, like predict(res_final, newdata = test_data) Error: In model 'mix4' column 'Gender' is not an ordered factor. Use mxFactor() on this column. Called from: verifyThresholds(flatModel, model, labelsData, data, translatedNames, threshName)

res_final and test_data just had ordered factors.

Could you help me?

Gootjes commented 6 months ago

Are you sure all the columns are of the same type in data and in test_data ? Do the columns show any differences. You can see that easily by running descriptives(data) and descriptives(test_data)

shinshinbooboo commented 6 months ago

Thank you!!

For tutorial, I selected test_data from the first 100 samples of data like this

test_data = data[1:100,] predict(res_final, newdata = test_data)

However, I got the same error.

Can you fix it?

shinshinbooboo commented 6 months ago

I could fix it. Data that was ordered by default as.ordered didn't work. But, data was factored ordinally by mxFactor(). It worked well.

Thank you!!