American-Institutes-for-Research / WeMix

WeMix public repository
GNU General Public License v2.0
10 stars 2 forks source link

WeMix does not converge with 3-level model. #14

Closed joaomaroco closed 7 months ago

joaomaroco commented 7 months ago

I am trying to fit this 3-level basal model:

hlmB0 <- mix(BELONG ~1 + (1|CNTSCHID) + (1|CNT), weights = c('SENWT','W_FSTUWT_SCH_SUM','WTCNT'), data = db)

from the PISA2022 Std files. The WTCNT is 1 for each country so the sum of WTCNT equals the number of countries in the data set. The data is in this zip file (too big to attach here):

https://ispaiu-my.sharepoint.com/:u:/g/personal/jpmaroco_ispa_pt/ERyhhGsJzdxEvkpnrSIVrt0BiIM5wq5c2o_49_bVkAI4HA?e=bnUvGb

My Linux cluster has 32 GB RAM and Xenon processor... I can't get the results. The running just takes forever...

Thanks!

blue-webb commented 7 months ago

We're trying to fit your model with this call:

library(haven)
library(WeMix)

base <- 'your/path/here'
stu_questionnaire <- read_sav(file.path(base,'STU_QQQ_SPSS/CY08MSP_STU_QQQ.sav'))
sch_questionnaire <- read_sav(file.path(base,'SCH_QQQ_SPSS/CY08MSP_SCH_QQQ.sav'))

stu_dat <- stu_questionnaire[,c('CNT','CNTRYID','CNTSCHID','SENWT','BELONG')]
sch_dat <- sch_questionnaire[,c('W_FSTUWT_SCH_SUM','CNTSCHID')]

db <- merge(stu_dat, sch_dat, by='CNTSCHID')
db$WTCNT <- 1

hlmB0 <- mix(BELONG ~1 + (1|CNTSCHID) + (1|CNT),  weights = c('SENWT','W_FSTUWT_SCH_SUM','WTCNT'),  data = db)

Data files downloaded from here: https://www.oecd.org/pisa/data/2022database/

Will update once the run completes.

joaomaroco commented 7 months ago

Thanks! My file has several std and sch variables for the next HLM steps... I hope it can run fast...

A sexta, 2/02/2024, 19:20, Blue Webb @.***> escreveu:

We're trying to fit your model with this call:

library(haven) library(WeMix)

base <- 'your/path/here' stu_questionnaire <- read_sav(file.path(base,'STU_QQQ_SPSS/CY08MSP_STU_QQQ.sav')) sch_questionnaire <- read_sav(file.path(base,'SCH_QQQ_SPSS/CY08MSP_SCH_QQQ.sav'))

stu_dat <- stu_questionnaire[,c('CNT','CNTRYID','CNTSCHID','SENWT','BELONG')] sch_dat <- sch_questionnaire[,c('W_FSTUWT_SCH_SUM','CNTSCHID')]

db <- merge(stu_dat, sch_dat, by='CNTSCHID') db$WTCNT <- 1

hlmB0 <- mix(BELONG ~1 + (1|CNTSCHID) + (1|CNT), weights = c('SENWT','W_FSTUWT_SCH_SUM','WTCNT'), data = db)

Data files downloaded from here: https://www.oecd.org/pisa/data/2022database/

Will update once the run completes.

— Reply to this email directly, view it on GitHub https://github.com/American-Institutes-for-Research/WeMix/issues/14#issuecomment-1924530141, or unsubscribe https://github.com/notifications/unsubscribe-auth/BERQLLCHNN3SNKIF273K7ODYRU4ARAVCNFSM6AAAAABCW4X75SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRUGUZTAMJUGE . You are receiving this because you authored the thread.Message ID: @.***>

pdbailey0 commented 7 months ago

It converged on my Windows laptop with 32 GB of memory. This is a huge model with 561k students, I think you may need to free up more memory, we limited our inputs to just the relevant data. But do check my note about weights.

> hlmB0 <- mix(BELONG ~1 + (1|CNTSCHID) + (1|CNT),  weights = c('SENWT','W_FSTUWT_SCH_SUM','WTCNT'),  data = db, verbose=TRUE)
Using lmer to get an approximate (unweighted) estimate and model structure.
Fitting weighted model.
Estimating covariance.
Warning message:
In mix(BELONG ~ 1 + (1 | CNTSCHID) + (1 | CNT), weights = c("SENWT",  :
  There were 52405 rows with missing data. These have been removed.
> summary(hlmB0)
Call:
mix(formula = BELONG ~ 1 + (1 | CNTSCHID) + (1 | CNT), data = db, 
    weights = c("SENWT", "W_FSTUWT_SCH_SUM", "WTCNT"), verbose = TRUE)

Variance terms:
 Level    Group        Name  Variance Std. Error  Std.Dev.
     3      CNT (Intercept) 3.938e-02  5.789e-03 1.984e-01
     2 CNTSCHID (Intercept) 4.711e-17  1.079e-17 6.864e-09
     1 Residual             8.561e-01  2.426e-02 9.252e-01
Groups:
 Level    Group n size  mean wgt  sum wgt
     3      CNT     78    1.0000       78
     2 CNTSCHID  20990 1328.7818 27891130
     1      Obs 561339    0.6476   363526

Fixed Effects:
            Estimate Std. Error t value
(Intercept) -0.09305    0.02267  -4.105

lnl= -487783.37 
Intraclass Correlation= 0.04397 

One thing to notice is that these weights do not make a lot of sense as unconditional weights. The implied number of schools in participating countries is about 28M while the implied number of students is 363k--suggesting that there is fewer than one student per school. Typically the weights are adjusted for an HLM, we talk about the literature on adjusting weights in the vignette.

joaomaroco commented 7 months ago

Thanks Paul et al.

Yes, your comments about the schwght make a lot of since. I am using the wght in the sch datafile from OECD:

image

I would say that school weights should be the inverse of the school selection probabilities, not the sum of stdwghts in each selected school. But the stdwghts also reflect the probability of students selected in the selected school?... in that case the school weights should be conditional... I am not just sure each of the 3 schwght that OECD gives should be used a schwght.... in a 3 level HLM...

pdbailey0 commented 7 months ago

To be clear, I concluded you probably ran out of memory and that's why it failed. If that's not the case, please reopen this. You can check by running

nohup R CMD BATCH myFile.R &

and then myFile.Rout will have the reason it stopped. It's possible that you can add virtual memory and that would work, but I'd guess that would be very slow. If you are out of memory you'll see #Error: cannot allocate vector of size [a size here]

pdbailey0 commented 7 months ago

As for the weights, that's beyond the slope of WeMix issues--I'm just raising it as a possible issue to consider.

joaomaroco commented 7 months ago

Thanks Paul. Yes, I realize that. I was also able to run fixed slopes for several level 1 predictors... however, with all the missings, the data set was just 5 countries (I still need to look into that!). But WeMix was blazing fast. Thanks for a good package! Warm regards, João


From: Paul Bailey @.> Sent: Monday, February 5, 2024 3:33 PM To: American-Institutes-for-Research/WeMix @.> Cc: João Marôco @.>; Author @.> Subject: Re: [American-Institutes-for-Research/WeMix] WeMix does not converge with 3-level model. (Issue #14)

As for the weights, that's beyond the slope of WeMix issues--I'm just raising it as a possible issue to consider.

— Reply to this email directly, view it on GitHubhttps://www.google.com/url?q=https://github.com/American-Institutes-for-Research/WeMix/issues/14%23issuecomment-1927268655&source=gmail-imap&ust=1707751989000000&usg=AOvVaw0OmjByUP2xKOjfyqF70Rf-, or unsubscribehttps://www.google.com/url?q=https://github.com/notifications/unsubscribe-auth/BERQLLDU22KV6RTJ677FC4LYSD3TJAVCNFSM6AAAAABCW4X75SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRXGI3DQNRVGU&source=gmail-imap&ust=1707751989000000&usg=AOvVaw0-FMhZFrDNFteCAQh37txy. You are receiving this because you authored the thread.Message ID: @.***>