data-edu / tidyLPA

Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software
https://data-edu.github.io/tidyLPA/
Other
56 stars 17 forks source link

Error with model 6 mclust #125

Closed aken2 closed 5 years ago

aken2 commented 5 years ago

I am looking to compare solutions for an LPA model with varying covariance (model 6 arg) and I am encountering an error. I am able to create models 1 and 3 but I am unsure of how to resolve this issue. Thanks for your help!

m6 <- LPA_PS[,c(1:4)] %>% 
+   scale() %>%
+   single_imputation(., method = "missForest") %>%
+   estimate_profiles(n_profiles = 2:5, models = 6)
  missForest iteration 1 in progress...done!
  missForest iteration 2 in progress...done!
  missForest iteration 3 in progress...done!

The 'variances'/'covariances' arguments were ignored in favor of the 'models' argument.
**Error in bootLRTS[, 1:g, drop = FALSE] : subscript out of bounds**
ebmtnprof commented 5 years ago

Hi - I am running into similar issues (using mclust). I can get solutions for models 1 & 3. Models 4 & 5 say it can't be done with mclust (so fair enough). But for models 2 & 6 I am getting the same error message as above. Thoughts?? Thanks! Cheers, Emily

cjvanlissa commented 5 years ago

If you can email a fully reproducible syntax and your data (or simulated mock data giving the same error) to c.j.vanlissa@uu.nl I can try to debug this for you!

jrosen48 commented 5 years ago

Wondering if this is a more general issue that others are experiencing, too. @cjvanlissa, any idea if the bootLRTS-related code might have broken in one of our updates?

cjvanlissa commented 5 years ago

It's not triggering any unit tests.. so probably not. A reproducible example would be helpful to identify any problems!

aken2 commented 5 years ago

Thank you so much! Sending you some mock data generating the same error now. Any tips are much appreciated!

cjvanlissa commented 5 years ago

Debugged this, and it turns out the error originates in Mclust, which can be verified by running:

mclustBootstrapLRT(your_data, modelName = "VVV", nboot = 100, maxG = 4)

Only the 1-class model converges (my guess is that the rest is too complex), and when mclustBootstrapLRT tries to compare the 1-class model against something else, there is nothing to compare it to, and you get this error. I'll see if I can wrap the error message, but that's about all I can do.

cjvanlissa commented 5 years ago

FYI: I wrapped the error, and now your data returns the following output with informative error messages:

Data_imputed %>% 
+     estimate_profiles(n_profiles = 1:5, models = 6)
The 'variances'/'covariances' arguments were ignored in favor of the 'models' argument.
Warning messages:
1: Mclust could not estimate model 6 with 2 classes. 
2: Mclust could not estimate model 6 with 3 classes. 
3: Mclust could not estimate model 6 with 4 classes. 
4: Mclust could not estimate model 6 with 5 classes. 
5: 
One or more analyses resulted in warnings! Examine these analyses carefully: model_6_class_2, model_6_class_3, model_6_class_4, model_6_class_5 
> tmp
tidyLPA analysis using mclust: 

 Model Classes AIC     BIC     Entropy prob_min prob_max n_min n_max BLRT_p
 6     1       2266.47 2313.85 1.00    1.00     1.00     1.00  1.00        
 6     2                                                                   
 6     3                                                                   
 6     4                                                                   
 6     5                    
ebmtnprof commented 5 years ago

Thanks. Not that anything is actually fixed or explained, but if that's all you can do then I suppose tidyLPA is just a more limited tool than I initially thought. Ah well.... Cheers, Emily

On Tue, Jul 23, 2019 at 9:57 AM C. J. van Lissa notifications@github.com wrote:

FYI: I wrapped the error, and now your data returns the following output with informative error messages:

` Data_imputed %>%

-

estimate_profiles(n_profiles = 1:5, models = 6)

The 'variances'/'covariances' arguments were ignored in favor of the 'models' argument. Warning messages: 1: Mclust could not estimate model 6 with 2 classes. 2: Mclust could not estimate model 6 with 3 classes. 3: Mclust could not estimate model 6 with 4 classes. 4: Mclust could not estimate model 6 with 5 classes. 5: One or more analyses resulted in warnings! Examine these analyses carefully: model_6_class_2, model_6_class_3, model_6_class_4, model_6_class_5

tmp tidyLPA analysis using mclust:

Model Classes AIC BIC Entropy prob_min prob_max n_min n_max BLRT_p 6 1 2266.47 2313.85 1.00 1.00 1.00 1.00 1.00 6 2 6 3 6 4 6 5 `

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/data-edu/tidyLPA/issues/125?email_source=notifications&email_token=AJBVEOW36LDYXXOGMNQQVZLQA4Z7NA5CNFSM4IFZ2RO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2TYO7I#issuecomment-514295677, or mute the thread https://github.com/notifications/unsubscribe-auth/AJBVEOVEBTWB6F6AYYDOGYLQA4Z7NANCNFSM4IFZ2ROQ .

--

Emily A. Butler

Professor & Graduate Director Family Studies and Human Development College of Agriculture & Life Sciences University of Arizona Tucson, AZ, 85721-0033

jrosen48 commented 5 years ago

Something seems confusing to me: mclust can't fit model types 4 and 5, and yet can fit the other four model types (1, 2, 3, and 6), but sometimes doesn't because of an error in the estimation. Should these two sources of a model not being able to be estimated be distinguished in the output?

ebmtnprof commented 5 years ago

and perhaps there is something that could be done, such as increasing iterations or something?? Cheers, Emily

On Tue, Jul 23, 2019 at 4:08 PM Joshua Rosenberg notifications@github.com wrote:

Something seems confusing to me: mclust can't fit model types 4 and 5, and yet can fit the other four model types (1, 2, 3, and 6), but sometimes doesn't because of an error in the estimation. Should these two sources of a model not being able to be estimated be distinguished in the output?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/data-edu/tidyLPA/issues/125?email_source=notifications&email_token=AJBVEOU5KAVS55QSF5NZWI3QA6FQLA5CNFSM4IFZ2RO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2UV2YQ#issuecomment-514415970, or mute the thread https://github.com/notifications/unsubscribe-auth/AJBVEOSZLVJ4G5Z2PF6PZL3QA6FQLANCNFSM4IFZ2ROQ .

--

Emily A. Butler

Professor & Graduate Director Family Studies and Human Development College of Agriculture & Life Sciences University of Arizona Tucson, AZ, 85721-0033

cjvanlissa commented 5 years ago

Something seems confusing to me: mclust can't fit model types 4 and 5, and yet can fit the other four model types (1, 2, 3, and 6), but sometimes doesn't because of an error in the estimation. Should these two sources of a model not being able to be estimated be distinguished in the output?

Josh, this IS referenced in the output. If you request model type 4/5 with Mclust, estimate_profiles gives an error (as it should) ;)

cjvanlissa commented 5 years ago

Thanks. Not that anything is actually fixed or explained, but if that's all you can do then I suppose tidyLPA is just a more limited tool than I initially thought. Ah well.... Cheers, Emily

The fact that the model is too complex to estimate is a research finding that can be reported, not a bug to be ironed out. In your case, model 6 estimates 19 parameters PER CLASS, with 218 participants. So the two-class solution has less than 6 participants per parameter.

ebmtnprof commented 5 years ago

Thanks - I wasn't trying to estimate model 6 and that isn't my data you are referencing. That was the other person who submitted an issue - I was interested in model 2. Once I saw there was an issue I just tried each model to see what behavior ensued with my data to see if I had the same issue.

And BTW, I take it you changed the way get_data behaves again? My code using it is broken again and it appears to be due to different behavior for that function. So, I think you've convinced me to learn to use mclust myself. Thanks for the push to quit being so lazy :)

Cheers, Emily

On Tue, Jul 23, 2019 at 11:41 PM C. J. van Lissa notifications@github.com wrote:

Thanks. Not that anything is actually fixed or explained, but if that's all you can do then I suppose tidyLPA is just a more limited tool than I initially thought. Ah well.... Cheers, Emily

The fact that the model is too complex to estimate is a research finding that can be reported, not a bug to be ironed out. In your case, model 6 estimates 19 parameters PER CLASS, with 218 participants. So the two-class solution has less than 6 participants per parameter.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/data-edu/tidyLPA/issues/125?email_source=notifications&email_token=AJBVEOX2Q7MKXJDZ643W22TQA72STA5CNFSM4IFZ2RO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VK3AI#issuecomment-514502017, or mute the thread https://github.com/notifications/unsubscribe-auth/AJBVEOR55KWODWPA4OPSNV3QA72STANCNFSM4IFZ2ROQ .

--

Emily A. Butler

Professor & Graduate Director Family Studies and Human Development College of Agriculture & Life Sciences University of Arizona Tucson, AZ, 85721-0033

jrosen48 commented 5 years ago

All good @ebmtnprof. We did change that - we are about to push that release to CRAN. A number of folks asked for the data in wide format, hence the change. It is still possible to obtain the data in long form; I'll post how later today.

jrosen48 commented 5 years ago

@ebmtnprof I think the easiest way (to me) would be to use the gather function from the tidyr package, e.g.:

library(tidyLPA)
#> tidyLPA is intended for academic use. We do not make any money on this and only ask that you please cite this in publications when you use the results. You can use the function citation('tidyLPA') to create a citation.Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
m <- pisaUSA15[1:100, ] %>%
    select(broad_interest, enjoyment, self_efficacy) %>%
    single_imputation() %>%
    estimate_profiles(3)

get_data(m) %>% 
    tidyr::gather(Class_prob, Probability, contains("CPROB"))
#> # A tibble: 300 x 8
#>    model_number classes_number broad_interest enjoyment self_efficacy Class
#>           <dbl>          <dbl>          <dbl>     <dbl>         <dbl> <dbl>
#>  1            1              3            3.8       4            1        1
#>  2            1              3            3         3            2.75     3
#>  3            1              3            1.8       2.8          3.38     2
#>  4            1              3            1.4       1            2.75     2
#>  5            1              3            1.8       2.2          2        3
#>  6            1              3            1.6       1.6          1.88     3
#>  7            1              3            3         3.8          2.25     1
#>  8            1              3            2.6       2.2          2        3
#>  9            1              3            1         2.8          2.62     3
#> 10            1              3            2.2       2            1.75     3
#> # … with 290 more rows, and 2 more variables: Class_prob <chr>,
#> #   Probability <dbl>

Created on 2019-07-25 by the reprex package (v0.3.0)

ebmtnprof commented 5 years ago

Thanks for the response. You'll be glad to hear I won't be bugging you anymore :) Yesterday I took the plunge and got mclust working for what I need, so my package is no longer reliant on a stable version of tidyLPA. Cheers, Emily

On Thu, Jul 25, 2019 at 10:51 AM Joshua Rosenberg notifications@github.com wrote:

@ebmtnprof https://github.com/ebmtnprof I think the easiest way (to me) would be to use the gather function from the tidyr package, e.g.:

library(tidyLPA)

> tidyLPA is intended for academic use. We do not make any money on this and only ask that you please cite this in publications when you use the results. You can use the function citation('tidyLPA') to create a citation.Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().

library(dplyr)

>

> Attaching package: 'dplyr'

> The following objects are masked from 'package:stats':

>

> filter, lag

> The following objects are masked from 'package:base':

>

> intersect, setdiff, setequal, union

m <- pisaUSA15[1:100, ] %>%

select(broad_interest, enjoyment, self_efficacy) %>%

single_imputation() %>%

estimate_profiles(3)

get_data(m) %>%

tidyr::gather(Class_prob, Probability, contains("CPROB"))

> # A tibble: 300 x 8

> model_number classes_number broad_interest enjoyment self_efficacy Class

>

> 1 1 3 3.8 4 1 1

> 2 1 3 3 3 2.75 3

> 3 1 3 1.8 2.8 3.38 2

> 4 1 3 1.4 1 2.75 2

> 5 1 3 1.8 2.2 2 3

> 6 1 3 1.6 1.6 1.88 3

> 7 1 3 3 3.8 2.25 1

> 8 1 3 2.6 2.2 2 3

> 9 1 3 1 2.8 2.62 3

> 10 1 3 2.2 2 1.75 3

> # … with 290 more rows, and 2 more variables: Class_prob ,

> # Probability

Created on 2019-07-25 by the reprex package https://reprex.tidyverse.org (v0.3.0)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/data-edu/tidyLPA/issues/125?email_source=notifications&email_token=AJBVEOVZUM4SDL6YTWADJWTQBHRZJA5CNFSM4IFZ2RO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD22IFIA#issuecomment-515146400, or mute the thread https://github.com/notifications/unsubscribe-auth/AJBVEOTNTKSDMCZ4ACKYMVLQBHRZJANCNFSM4IFZ2ROQ .

--

Emily A. Butler

Professor & Graduate Director Family Studies and Human Development College of Agriculture & Life Sciences University of Arizona Tucson, AZ, 85721-0033