Closed joaomaroco closed 7 months ago
We're trying to fit your model with this call:
library(haven)
library(WeMix)
base <- 'your/path/here'
stu_questionnaire <- read_sav(file.path(base,'STU_QQQ_SPSS/CY08MSP_STU_QQQ.sav'))
sch_questionnaire <- read_sav(file.path(base,'SCH_QQQ_SPSS/CY08MSP_SCH_QQQ.sav'))
stu_dat <- stu_questionnaire[,c('CNT','CNTRYID','CNTSCHID','SENWT','BELONG')]
sch_dat <- sch_questionnaire[,c('W_FSTUWT_SCH_SUM','CNTSCHID')]
db <- merge(stu_dat, sch_dat, by='CNTSCHID')
db$WTCNT <- 1
hlmB0 <- mix(BELONG ~1 + (1|CNTSCHID) + (1|CNT), weights = c('SENWT','W_FSTUWT_SCH_SUM','WTCNT'), data = db)
Data files downloaded from here: https://www.oecd.org/pisa/data/2022database/
Will update once the run completes.
Thanks! My file has several std and sch variables for the next HLM steps... I hope it can run fast...
A sexta, 2/02/2024, 19:20, Blue Webb @.***> escreveu:
We're trying to fit your model with this call:
library(haven) library(WeMix)
base <- 'your/path/here' stu_questionnaire <- read_sav(file.path(base,'STU_QQQ_SPSS/CY08MSP_STU_QQQ.sav')) sch_questionnaire <- read_sav(file.path(base,'SCH_QQQ_SPSS/CY08MSP_SCH_QQQ.sav'))
stu_dat <- stu_questionnaire[,c('CNT','CNTRYID','CNTSCHID','SENWT','BELONG')] sch_dat <- sch_questionnaire[,c('W_FSTUWT_SCH_SUM','CNTSCHID')]
db <- merge(stu_dat, sch_dat, by='CNTSCHID') db$WTCNT <- 1
hlmB0 <- mix(BELONG ~1 + (1|CNTSCHID) + (1|CNT), weights = c('SENWT','W_FSTUWT_SCH_SUM','WTCNT'), data = db)
Data files downloaded from here: https://www.oecd.org/pisa/data/2022database/
Will update once the run completes.
— Reply to this email directly, view it on GitHub https://github.com/American-Institutes-for-Research/WeMix/issues/14#issuecomment-1924530141, or unsubscribe https://github.com/notifications/unsubscribe-auth/BERQLLCHNN3SNKIF273K7ODYRU4ARAVCNFSM6AAAAABCW4X75SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRUGUZTAMJUGE . You are receiving this because you authored the thread.Message ID: @.***>
It converged on my Windows laptop with 32 GB of memory. This is a huge model with 561k students, I think you may need to free up more memory, we limited our inputs to just the relevant data. But do check my note about weights.
> hlmB0 <- mix(BELONG ~1 + (1|CNTSCHID) + (1|CNT), weights = c('SENWT','W_FSTUWT_SCH_SUM','WTCNT'), data = db, verbose=TRUE)
Using lmer to get an approximate (unweighted) estimate and model structure.
Fitting weighted model.
Estimating covariance.
Warning message:
In mix(BELONG ~ 1 + (1 | CNTSCHID) + (1 | CNT), weights = c("SENWT", :
There were 52405 rows with missing data. These have been removed.
> summary(hlmB0)
Call:
mix(formula = BELONG ~ 1 + (1 | CNTSCHID) + (1 | CNT), data = db,
weights = c("SENWT", "W_FSTUWT_SCH_SUM", "WTCNT"), verbose = TRUE)
Variance terms:
Level Group Name Variance Std. Error Std.Dev.
3 CNT (Intercept) 3.938e-02 5.789e-03 1.984e-01
2 CNTSCHID (Intercept) 4.711e-17 1.079e-17 6.864e-09
1 Residual 8.561e-01 2.426e-02 9.252e-01
Groups:
Level Group n size mean wgt sum wgt
3 CNT 78 1.0000 78
2 CNTSCHID 20990 1328.7818 27891130
1 Obs 561339 0.6476 363526
Fixed Effects:
Estimate Std. Error t value
(Intercept) -0.09305 0.02267 -4.105
lnl= -487783.37
Intraclass Correlation= 0.04397
One thing to notice is that these weights do not make a lot of sense as unconditional weights. The implied number of schools in participating countries is about 28M while the implied number of students is 363k--suggesting that there is fewer than one student per school. Typically the weights are adjusted for an HLM, we talk about the literature on adjusting weights in the vignette.
Thanks Paul et al.
Yes, your comments about the schwght make a lot of since. I am using the wght in the sch datafile from OECD:
I would say that school weights should be the inverse of the school selection probabilities, not the sum of stdwghts in each selected school. But the stdwghts also reflect the probability of students selected in the selected school?... in that case the school weights should be conditional... I am not just sure each of the 3 schwght that OECD gives should be used a schwght.... in a 3 level HLM...
To be clear, I concluded you probably ran out of memory and that's why it failed. If that's not the case, please reopen this. You can check by running
nohup R CMD BATCH myFile.R &
and then myFile.Rout
will have the reason it stopped. It's possible that you can add virtual memory and that would work, but I'd guess that would be very slow. If you are out of memory you'll see #Error: cannot allocate vector of size [a size here]
As for the weights, that's beyond the slope of WeMix
issues--I'm just raising it as a possible issue to consider.
Thanks Paul. Yes, I realize that. I was also able to run fixed slopes for several level 1 predictors... however, with all the missings, the data set was just 5 countries (I still need to look into that!). But WeMix was blazing fast. Thanks for a good package! Warm regards, João
From: Paul Bailey @.> Sent: Monday, February 5, 2024 3:33 PM To: American-Institutes-for-Research/WeMix @.> Cc: João Marôco @.>; Author @.> Subject: Re: [American-Institutes-for-Research/WeMix] WeMix does not converge with 3-level model. (Issue #14)
As for the weights, that's beyond the slope of WeMix issues--I'm just raising it as a possible issue to consider.
— Reply to this email directly, view it on GitHubhttps://www.google.com/url?q=https://github.com/American-Institutes-for-Research/WeMix/issues/14%23issuecomment-1927268655&source=gmail-imap&ust=1707751989000000&usg=AOvVaw0OmjByUP2xKOjfyqF70Rf-, or unsubscribehttps://www.google.com/url?q=https://github.com/notifications/unsubscribe-auth/BERQLLDU22KV6RTJ677FC4LYSD3TJAVCNFSM6AAAAABCW4X75SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRXGI3DQNRVGU&source=gmail-imap&ust=1707751989000000&usg=AOvVaw0-FMhZFrDNFteCAQh37txy. You are receiving this because you authored the thread.Message ID: @.***>
I am trying to fit this 3-level basal model:
hlmB0 <- mix(BELONG ~1 + (1|CNTSCHID) + (1|CNT), weights = c('SENWT','W_FSTUWT_SCH_SUM','WTCNT'), data = db)
from the PISA2022 Std files. The WTCNT is 1 for each country so the sum of WTCNT equals the number of countries in the data set. The data is in this zip file (too big to attach here):
https://ispaiu-my.sharepoint.com/:u:/g/personal/jpmaroco_ispa_pt/ERyhhGsJzdxEvkpnrSIVrt0BiIM5wq5c2o_49_bVkAI4HA?e=bnUvGb
My Linux cluster has 32 GB RAM and Xenon processor... I can't get the results. The running just takes forever...
Thanks!