My name is Diego G. Campos. I am a doctoral research fellow at the University of Oslo. Dr. Isa Steinmann and I are teaching a course about International Large Scale Assessments (ILSA). As part of the course, we want students to use the EdSurvey package to replicate some of the results reported by the IEA. We noticed that the SE is slightly off compared to the ones reported in the official reports. We wondered whether this is due to a bug or related to a different estimation process used by the package.

We also noted problems with the SD function as it does not return exactly the values that the EdSurvey description says it would (e.g., "df" instead of "sd") and it does not replicate the standard errors from the PIRLS report.

We have tried the code with two different ILSA: PIRLS and TIMSS.

Please find the code below. I am also enclosing the official tables reported by IEA.

Example 1

# insert a reprex here
##### Example 1 : TIMSS DATA 2019 Norway ------------------------------------------------
install.packages("replex")
#> Warning: package 'replex' is not available for this version of R
#> 
#> A version of this package for your version of R might be available elsewhere,
#> see the ideas at
#> https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
install.packages("EdSurvey")
#> 
#> The downloaded binary packages are in
#>  /var/folders/x_/z8l4xmf14t9b184k43_jjnz80000gp/T//Rtmp6i8wMj/downloaded_packages

library(replex)
#> Error in library(replex): there is no package called 'replex'
library(EdSurvey)
#> Loading required package: car
#> Loading required package: carData
#> Loading required package: lfactors
#> lfactors v1.0.4
#> EdSurvey v2.7.0
#> 
#> Attaching package: 'EdSurvey'
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind

##### Example 1: TIMSS DATA 2019 Norway --------------------------------------------------
#TIMSS data path
timssdata <- "/Volumes/GoogleDrive/My Drive/PhD Measurement and Evaluation/Projects/Paper 3/TIMSS_BFLPE/TIMSS_2019/input/T19_G4_SPSS Data "

#read data
data <- readTIMSS(timssdata, countries = "NOR" , gradeLvl = 4,
                  forceReread = FALSE,verbose = TRUE)
#> Found cached data for country code "nor".

# Teacher formation ------------------------------------------------------------
# What is the percentage of students that were taught by teachers with a major in primary education and a major or specialization in mathematics?
# What is the percentage of students that were taught by teachers with a major in primary education but no major or specialization in mathematics?
# What is the percentage of students that were taught by teachers with a major in mathematics but not primary education?

summary2(data, "atdmmem", weightVar = "matwgt", omittedLevels = TRUE)
#> Warning in calcEdsurveyTable(formula, data, weightVar, jrrIMax,
#> pctAggregationLevel, : Removing 1031 rows with 0 weight from analysis.
#> Estimates are weighted using the weight variable 'matwgt'
#>                                  atdmmem    N Weighted N Weighted Percent
#> 1     MAJOR IN EDUCATION AND MATHEMATICS 2122 29073.1258        62.770423
#> 2 MAJOR IN EDUCATION BUT NOT MATHEMATICS  781 13848.8371        29.900375
#> 3 MAJOR IN MATHEMATICS BUT NOT EDUCATION  193  2817.0448         6.082149
#> 4                       ALL OTHER MAJORS  103   577.5921         1.247052
#>   Weighted Percent SE
#> 1            4.249353
#> 2            3.810623
#> 3            2.125134
#> 4            0.505488

# The output does not replicate the results presented in p. 385 
# SE are slightly off

# Job satisfaction --------------------------------------------------------------
# Do students taught by teachers with high job satisfaction perform better in mathematics than students taught by less satisfied teachers?
# Try to replicate the results on page 407
# ATBGTJS scale
# ATDGTJS category

summary2(data, "atdgtjs", weightVar = "sciwgt", omittedLevels = TRUE)
#> Warning in calcEdsurveyTable(formula, data, weightVar, jrrIMax,
#> pctAggregationLevel, : Removing 1233 rows with 0 weight from analysis.
#> Estimates are weighted using the weight variable 'sciwgt'
#>               atdgtjs    N Weighted N Weighted Percent Weighted Percent SE
#> 1      VERY SATISFIED 1438  20776.268        48.875935            4.291572
#> 2  SOMEWHAT SATISFIED 1417  20347.534        47.867344            4.436643
#> 3 LESS THAN SATISFIED   73   1384.373         3.256722            1.663211

# The output does not replicate the results presented in p. 407
# SE are slightly off

Example 2 PIRLS 2016

##### Example 2 : PIRLS DATA 2016 Norway ------------------------------------------------
devtools::install_github("tidyverse/reprex", force = TRUE)
#> Downloading GitHub repo tidyverse/reprex@HEAD
#> 
#>      checking for file ‘/private/var/folders/x_/z8l4xmf14t9b184k43_jjnz80000gp/T/RtmpqHpd4b/remotes5b4812ad89ef/tidyverse-reprex-945d63d/DESCRIPTION’ ...  ✓  checking for file ‘/private/var/folders/x_/z8l4xmf14t9b184k43_jjnz80000gp/T/RtmpqHpd4b/remotes5b4812ad89ef/tidyverse-reprex-945d63d/DESCRIPTION’ (348ms)
#>   ─  preparing ‘reprex’:
#>      checking DESCRIPTION meta-information ...  ✓  checking DESCRIPTION meta-information
#>   ─  checking for LF line-endings in source and make files and shell scripts
#>   ─  checking for empty or unneeded directories
#>   ─  looking to see if a ‘data/datalist’ file should be added
#>   ─  building ‘reprex_2.0.1.9000.tar.gz’
#>      
#> 
install.packages("EdSurvey")
#> 
#> The downloaded binary packages are in
#>  /var/folders/x_/z8l4xmf14t9b184k43_jjnz80000gp/T//RtmpqHpd4b/downloaded_packages

library(reprex)
library(EdSurvey)
#> Loading required package: car
#> Loading required package: carData
#> Loading required package: lfactors
#> lfactors v1.0.4
#> EdSurvey v2.7.0
#> 
#> Attaching package: 'EdSurvey'
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind

## load PIRLS 2016 NOR data
norr4 <- readPIRLS("/Users/diego/Downloads/P16_SPSSData_pt2", countries = c("nor"))
#> Found cached data for country code "nor".

## We aimed to replicate Norway's basic results on the overall reading scale in PIRLS 2016 
## (see report here: http://timssandpirls.bc.edu/pirls2016/international-results/wp-content/uploads/structure/PIRLS/11.-appendices/F_2_standard-deviations-of-reading-achievement.pdf).

## Problems with function SD():
SD(norr4, variable = "rrea", varMethod = "taylor", weightVar = "totwgt") 
#> $mean
#> [1] 558.9496
#> 
#> $std
#> [1] 65.49801
#> 
#> $stdSE
#> [1] 1.590013
#> 
#> $stdVar
#> $stdVar$varImp
#> [1] 0.3950653
#> 
#> $stdVar$varSamp
#> [1] 2.133076
#> 
#> 
#> $df
#> [1] 93.8776
# This function does not return exactly the values that the EdSurvey description says it would (e.g., "df" instead of "sd") and it does not replicate the standard errors from the PIRLS report. 
SD(norr4, variable = "rrea", varMethod = "taylor", weightVar = NULL) 
#> $mean
#> [1] 558.9496
#> 
#> $std
#> [1] 65.49801
#> 
#> $stdSE
#> [1] 1.590013
#> 
#> $stdVar
#> $stdVar$varImp
#> [1] 0.3950653
#> 
#> $stdVar$varSamp
#> [1] 2.133076
#> 
#> 
#> $df
#> [1] 93.8776
# Also, it seems not possible to obtain unweighted results.

## The other functions do either not contain the standard errors or standard deviations:
summary2(norr4, variable = "rrea", weightVar = "totwgt") # The standard deviation differs slightly from PIRLS report, standard error not reported.
#> Estimates are weighted using the weight variable 'totwgt'
#>   Variable    N Weighted N     Min.  1st Qu.   Median     Mean 3rd Qu.    Max.
#> 1     rrea 4232   56609.57 299.4266 518.1312 562.2533 558.9496 602.567 778.419
#>         SD NA's Zero-weights
#> 1 65.50776    0            0
edsurveyTable(rrea ~ 1, norr4, varMethod = "taylor", weightVar = "totwgt") # Replicates mean and standard error of the mean from PIRLS report, does not report standard deviation. 
#> 
#> Formula: rrea ~ 1 
#> 
#> Plausible values: 5
#> Weight variable: 'totwgt'
#> Variance method: Taylor series
#> full data n: 4232
#> n used: 4232
#> 
#> 
#> Summary Table:
#>     N    WTD_N PCT     MEAN SE(MEAN)
#>  4232 56609.57 100 558.9496  2.25825
summary(lm.sdf(rrea ~ 1, norr4, varMethod = "taylor", weightVar = "totwgt")) # Replicates mean and standard error of the mean from PIRLS report, does not report standard deviation.
#> 
#> Formula: rrea ~ 1
#> 
#> Weight variable: 'totwgt'
#> Variance method: Taylor series
#> Plausible values: 5
#> jrrIMax: 5
#> full data n: 4232
#> n used: 4232
#> 
#> Coefficients:
#>                 coef       se      t   dof  Pr(>|t|)    
#> (Intercept) 558.9496   2.2582 247.51 70.16 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Multiple R-squared: 0

# insert the `sessionInfo()` output
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.33        magrittr_2.0.1    rlang_0.4.11      fastmap_1.1.0    
#>  [5] fansi_0.5.0       stringr_1.4.0     styler_1.5.1      highr_0.9        
#>  [9] tools_4.1.1       xfun_0.25         utf8_1.2.2        withr_2.4.2      
#> [13] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.2.1        digest_0.6.27    
#> [17] tibble_3.1.4      lifecycle_1.0.0   crayon_1.4.1      purrr_0.3.4      
#> [21] vctrs_0.3.8       fs_1.5.0          glue_1.4.2        evaluate_0.14    
#> [25] rmarkdown_2.10    reprex_2.0.1.9000 stringi_1.7.4     compiler_4.1.1   
#> [29] pillar_1.6.2      backports_1.2.1   pkgconfig_2.0.3
[9_5-8_teacher-major-MS48.pdf](https://github.com/American-Institutes-for-Research/EdSurvey/files/7066213/9_5-8_teacher-major-MS48.pdf)
[9_21-25_teacher-satisfaction-MS48.pdf](https://github.com/American-Institutes-for-Research/EdSurvey/files/7066215/9_21-25_teacher-satisfaction-MS48.pdf)

Thank you F_2_standard-deviations-of-reading-achievement.pdf 9_5-8_teacher-major-MS48.pdf 9_21-25_teacher-satisfaction-MS48.pdf

Hi Diego, thanks for using EdSurvey and for your detailed issue documentation. You raised a few helpful issues. I'll take a stab at responding to them below.

Issue 1: SE is slightly off compared to the ones reported in the official reports (using TIMSS data in example 1)

For the two summary2 calls you provided, you mentioned that the S.E.s are slightly off between the EdSurvey results and the international report. However, I do find that the EdSurvey statistics (including the S.E.) match with the international report for both of the two summary2 calls you provided. For example, for the level "MAJOR IN EDUCATION AND MATHEMATICS", the EdSurvey results above shows that the percentage is 62.770423, and the percentage s.e. is 4.249353. The international report you provided, "Exhibit 9.5", lists that for Grade 4, math, and for Norway (5), in the first column, the percent of students is 63 and the s.e. of the percentage is 4.2. So everything matches from my perspective. Could you elaborate on what the issue is?

Please note that, to fully replicate exhibit 9.5 (or exhibit 9.21), you would need to use the edsurveyTable function instead of summary2, as edsurveyTable provide statistics for not only the percentages but also the averages. You can use summary2 if you are only interested in getting the percentage statistics or other stats such as N or weight N. In the example code below, I showed how to use edsurveyTable to get both the percentage and mean statistics.

> library(EdSurvey)
> #read data
> timssNOR <- readTIMSS(timssdata, countries = "NOR" , gradeLvl = 4,
+                   forceReread = FALSE,verbose = TRUE)
Found cached data for country code “nor”.
>
> #note that by default, jrrIMax is set to 1, which would give you a slightly different S.E. of the mean because it would only use 1 set of plausible values. seting `jrrIMax = Inf` ensures that all 5 PVs are used to replicate the international report
> edsurveyTable(mmat ~ atdmmem, data = timssNOR, jrrIMax = Inf, weightVar = "matwgt", omittedLevels = TRUE)

Formula: mmat ~ atdmmem 

Plausible values: 5
jrrIMax: 5
Weight variable: ‘matwgt’
Variance method: jackknife
JK replicates: 150
full data n: 6461
n used: 3199

Summary Table:
                                 atdmmem    N      WTD_N       PCT  SE(PCT)
1     MAJOR IN EDUCATION AND MATHEMATICS 2122 29073.1258 62.770423 4.249353
2 MAJOR IN EDUCATION BUT NOT MATHEMATICS  781 13848.8371 29.900375 3.810623
3 MAJOR IN MATHEMATICS BUT NOT EDUCATION  193  2817.0448  6.082149 2.125134
4                       ALL OTHER MAJORS  103   577.5921  1.247052 0.505488
      MEAN  SE(MEAN)
1 545.3295  2.660343
2 547.4547  5.429478
3 529.6531 12.239326
4 547.9567 14.333846
Warning message:
In calcEdsurveyTable(formula, data, weightVar, jrrIMax, pctAggregationLevel,  :
  Removing 1031 rows with 0 weight from analysis.

Issue 2: The `SD` does not return exactly the values that the `?SD` description says

Thanks for flagging this issue. Currently, when calling ?SD, the documentation reads "sd the degrees of freedom of the std" but it should read df the degrees of freedom of the std. We will make the fix soon.

Issue 3: The `SD` results are off compared to the ones reported in the official reports (using PIRLS data in example 2)

Thanks for flagging this issue. It does look like that the s.e. of the Standard Deviations are off between the SD function and the PIRLS report. Our team will investigate this issue. The following code illustrates that when varMethod = "jackknife", and jrrIMax = Inf, the SD reports the Standard deviation of 65.49801 with an S.E. of 1.702292. However, the PIRLS report shows that for Norway (5), the Standard deviation of the overall reading achievement is 65 with an S.E. of 1.3.

> #note that the code "no4" is for Norway (4), and the code "nor" is for Norway (5) in the PIRLS report. I'll use `nor` for this exercise.
> no4 <- readPIRLS("C:/EdSurveyData/PIRLS/2016", countries = c("no4"))
Found cached data for country code “no4”.
> nor <- readPIRLS("C:/EdSurveyData/PIRLS/2016", countries = c("nor"))
Found cached data for country code “nor”.
> 
> #The following code illustrates that changing the varMethod does not seem to change the results at all
> SD(nor, variable = "rrea", varMethod = "taylor", weightVar = "totwgt") 
$mean
[1] 558.9496

$std
[1] 65.49801

$stdSE
[1] 1.590013

$stdVar
$stdVar$varImp
[1] 0.3950653

$stdVar$varSamp
[1] 2.133076

$df
[1] 93.8776

> SD(nor, variable = "rrea", varMethod = "jackknife", weightVar = "totwgt") 
$mean
[1] 558.9496

$std
[1] 65.49801

$stdSE
[1] 1.590013

$stdVar
$stdVar$varImp
[1] 0.3950653

$stdVar$varSamp
[1] 2.133076

$df
[1] 93.8776

> 
> #Even if jrrIMax is set to Inf, the s.e. of StD is still off from the international report.
> SD(nor, variable = "rrea", varMethod = "jackknife", weightVar = "totwgt", jrrIMax = Inf) 
$mean
[1] 558.9496

$std
[1] 65.49801

$stdSE
[1] 1.702292

$stdVar
$stdVar$varImp
[1] 0.3950653

$stdVar$varSamp
[1] 2.502734

$df
[1] 68.45113

Issue 4: The `SD` does not seem to be able to generate unweighted results

You can set weightVar = NULL to generate unweight results, as shown below. However, currently setting weightVar = "totgwt" or weightVar = NULL does not seem to change the result at all, to which we will investigate and confirm if it's expected.

> SD(nor, variable = "rrea", varMethod = "jackknife", weightVar = NULL, jrrIMax = Inf) 
$mean
[1] 558.9496

$std
[1] 65.49801

$stdSE
[1] 1.702292

$stdVar
$stdVar$varImp
[1] 0.3950653

$stdVar$varSamp
[1] 2.502734

$df
[1] 68.45113

Issue 5: The `summary2`'s results have the SD slightly off from the PIRLS report.

This is another issue we will look into. To summarize it, in issue 4, we see that the StD is 65.49801, but here, when using summary2, the StD is 65.50776. The PIRLS report shows 65.

> summary2(nor, variable = "rrea", weightVar = "totwgt")
Estimates are weighted using the weight variable ‘totwgt’
  Variable    N Weighted N     Min.  1st Qu.   Median     Mean 3rd Qu.    Max.       SD NA's Zero-weights
1     rrea 4232   56609.57 299.4266 518.1312 562.2533 558.9496 602.567 778.419 65.50776    0            0

Issue 6: `edsurveyTable` does not report the s.e. of standard deviations, and `lm.sdf` does not report standard deviations.

I believe it's a design decision to not include s.e. of standard deviations in edsurveyTable (that's why there's a separate SD function) but thanks for the feedback. The lm.sdf function is following the reporting norms of the lm function which does not report the standard deviations.

I'd like to note that, in your edsurveyTable and lm.sdf calls, you used varMethod = "taylor". The suggested variance estimation method is "jackknife" for ILSA studies. In addition, both calls have jrrIMax = 1 as the default, which means only 1 set of plausible values are used in the variance estimation calculation. In order to replicate the international reports, please use jrrIMax = Inf, which will use all 5 sets of plausible values.

Dear Yuqi Liao,

Thank you for your reply. The feedback is beneficial! I will implement the changes in the code you suggested to fully replicate the results from the PIRLS and TIMSS data sets. I hope you can also track the source of the differences for the PIRLS datasets.

The package is excellent. I am grateful for your time.

Regards,

Diego,

On 31 Aug 2021, at 18:26, Yuqi Liao @.***> wrote:

Hi Diego, thanks for using EdSurvey and for your detailed issue documentation. You raised a few helpful issues. I'll take a stab at responding to them below.

Issue 1: SE is slightly off compared to the ones reported in the official reports (using TIMSS data in example 1)

For the two summary2 calls you provided, you mentioned that the S.E.s are slightly off between the EdSurvey results and the international report. However, I do find that the EdSurvey statistics (including the S.E.) match with the international report for both of the two summary2 calls you provided. For example, for the level "MAJOR IN EDUCATION AND MATHEMATICS", the EdSurvey results above shows that the percentage is 62.770423, and the percentage s.e. is 4.249353. The international report you provided, "Exhibit 9.5", lists that for Grade 4, math, and for Norway (5), in the first column, the percent of students is 63 and the s.e. of the percentage is 4.2. So everything matches from my perspective. Could you elaborate on what the issue is?

Please note that, to fully replicate exhibit 9.5 (or exhibit 9.21), you would need to use the edsurveyTable function instead of summary2, as edsurveyTable provide statistics for not only the percentages but also the averages. You can use summary2 if you are only interested in getting the percentage statistics or other stats such as N or weight N. In the example code below, I showed how to use edsurveyTable to get both the percentage and mean statistics.

library(EdSurvey)

read data

timssNOR <- readTIMSS(timssdata, countries = "NOR" , gradeLvl = 4,

forceReread = FALSE,verbose = TRUE) Found cached data for country code “nor”.

note that by default, jrrIMax is set to 1, which would give you a slightly different S.E. of the mean because it would only use 1 set of plausible values. seting jrrIMax = Inf ensures that all 5 PVs are used to replicate the international report

edsurveyTable(mmat ~ atdmmem, data = timssNOR, jrrIMax = Inf, weightVar = "matwgt", omittedLevels = TRUE)

Formula: mmat ~ atdmmem

Plausible values: 5 jrrIMax: 5 Weight variable: ‘matwgt’ Variance method: jackknife JK replicates: 150 full data n: 6461 n used: 3199

Summary Table: atdmmem N WTD_N PCT SE(PCT) 1 MAJOR IN EDUCATION AND MATHEMATICS 2122 29073.1258 62.770423 4.249353 2 MAJOR IN EDUCATION BUT NOT MATHEMATICS 781 13848.8371 29.900375 3.810623 3 MAJOR IN MATHEMATICS BUT NOT EDUCATION 193 2817.0448 6.082149 2.125134 4 ALL OTHER MAJORS 103 577.5921 1.247052 0.505488 MEAN SE(MEAN) 1 545.3295 2.660343 2 547.4547 5.429478 3 529.6531 12.239326 4 547.9567 14.333846 Warning message: In calcEdsurveyTable(formula, data, weightVar, jrrIMax, pctAggregationLevel, : Removing 1031 rows with 0 weight from analysis. Issue 2: The SD does not return exactly the values that the ?SD description says

Thanks for flagging this issue. Currently, when calling ?SD, the documentation reads "sd the degrees of freedom of the std" but it should read df the degrees of freedom of the std. We will make the fix soon.

Issue 3: The SD results are off compared to the ones reported in the official reports (using PIRLS data in example 2)

Thanks for flagging this issue. It does look like that the s.e. of the Standard Deviations are off between the SD function and the PIRLS report. Our team will investigate this issue. The following code illustrates that when varMethod = "jackknife", and jrrIMax = Inf, the SD reports the Standard deviation of 65.49801 with an S.E. of 1.702292. However, the PIRLS report shows that for Norway (5), the Standard deviation of the overall reading achievement is 65 with an S.E. of 1.3.

note that the code "no4" is for Norway (4), and the code "nor" is for Norway (5) in the PIRLS report. I'll use nor for this exercise.

no4 <- readPIRLS("C:/EdSurveyData/PIRLS/2016", countries = c("no4")) Found cached data for country code “no4”. nor <- readPIRLS("C:/EdSurveyData/PIRLS/2016", countries = c("nor")) Found cached data for country code “nor”.

The following code illustrates that changing the varMethod does not seem to change the results at all

SD(nor, variable = "rrea", varMethod = "taylor", weightVar = "totwgt") $mean [1] 558.9496

$std [1] 65.49801

$stdSE [1] 1.590013

$stdVar $stdVar$varImp [1] 0.3950653

$stdVar$varSamp [1] 2.133076

$df [1] 93.8776

SD(nor, variable = "rrea", varMethod = "jackknife", weightVar = "totwgt") $mean [1] 558.9496

$std [1] 65.49801

$stdSE [1] 1.590013

$stdVar $stdVar$varImp [1] 0.3950653

$stdVar$varSamp [1] 2.133076

$df [1] 93.8776

Even if jrrIMax is set to Inf, the s.e. of StD is still off from the international report.

SD(nor, variable = "rrea", varMethod = "jackknife", weightVar = "totwgt", jrrIMax = Inf) $mean [1] 558.9496

$std [1] 65.49801

$stdSE [1] 1.702292

$stdVar $stdVar$varImp [1] 0.3950653

$stdVar$varSamp [1] 2.502734

$df [1] 68.45113 Issue 4: The SD does not seem to be able to generate unweighted results

You can set weightVar = NULL to generate unweight results, as shown below. However, currently setting weightVar = "totgwt" or weightVar = NULL does not seem to change the result at all, to which we will investigate and confirm if it's expected.

SD(nor, variable = "rrea", varMethod = "jackknife", weightVar = NULL, jrrIMax = Inf) $mean [1] 558.9496

$std [1] 65.49801

$stdSE [1] 1.702292

$stdVar $stdVar$varImp [1] 0.3950653

$stdVar$varSamp [1] 2.502734

$df [1] 68.45113 Issue 5: The summary2's results have the SD slightly of from the PIRLS report.

This is another issue we will look into. To summarize it, in issue 4, we see that the StD is 65.49801, but here, when using summary2, the StD is 65.50776. The PIRLS report shows 65.

summary2(nor, variable = "rrea", weightVar = "totwgt") Estimates are weighted using the weight variable ‘totwgt’ Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD NA's Zero-weights 1 rrea 4232 56609.57 299.4266 518.1312 562.2533 558.9496 602.567 778.419 65.50776 0 0 Issue 6: edsurveyTable does not report the s.e. of standard deviations, and lm.sdf does not report standard deviations.

I believe it's a design decision to not include s.e. of standard deviations in edsurveyTable (that's why there's a separate SD function) but thanks for the feedback. The lm.sdf function is following the reporting norms of the lm function which does not report the standard deviations.

I'd like to note that, in your edsurveyTable and lm.sdf calls, you used varMethod = "taylor". The suggested variance estimation method is "jackknife" for ILSA studies. In addition, both calls have jrrIMax = 1 as the default, which means only 1 set of plausible values are used in the variance estimation calculation. In order to replicate the international reports, please use jrrIMax = Inf, which will use all 5 sets of plausible values.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/American-Institutes-for-Research/EdSurvey/issues/10#issuecomment-909390801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5KMVDNYV6I5F7C2XIG6VLT7T7ELANCNFSM5C5IPB2A. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Aviso legal: El contenido de este mensaje y los archivos adjuntos son confidenciales y de uso exclusivo de la Universidad Nacional de Colombia. Se encuentran dirigidos sólo para el uso del destinatario al cual van enviados. La reproducción, lectura y/o copia se encuentran prohibidas a cualquier persona diferente a este y puede ser ilegal. Si usted lo ha recibido por error, infórmenos y elimínelo de su correo. Los Datos Personales serán tratados conforme a la Ley 1581 de 2012 y a nuestra Política de Datos Personales que podrá consultar en la página web www.unal.edu.co http://www.unal.edu.co/. Las opiniones, informaciones, conclusiones y cualquier otro tipo de dato contenido en este correo electrónico, no relacionados con la actividad de la Universidad Nacional de Colombia, se entenderá como personales y de ninguna manera son avaladas por la Universidad.

Hi @diegc15,

To echo @yuqiliao, thanks for this excellent report!

I just pushed a new version to this GitHub repository that I think addresses all of the above issues. You can now install it with these instructions

A few notes:

the SD and summary2 functions used the population and sample standard deviations, respectively, because we tested them with surveys where the provider uses those formulas. We're deciding how to deconflict this, but for this version, they should agree. I think, at least, the default will remain so that the default will result in agreement between these two and both using the sample standard deviation.
SD actually documented why the standard deviation would be off (you would have had to say what the replicate weight pre-multiplier is, which is 1/2 or PRILS), but that is the only EdSurvey function that didn't just automatically set it, which is confusing! So I updated it so it defaults to just giving you the correct result. It's unclear why we have the jkSumMultiplier argument, so I may simply remove it in a future version.

Again, I really appreciate you documenting these issues so clearly!

Let us know how this works!

Best, Paul

Dear Paul,

Thank you for the updates and the great work! We are looking forward to using your package in our teaching.

Regards,

Diego,

On 7 Sep 2021, at 18:50, Paul Bailey @.***> wrote:

Hi @diegc15 https://github.com/diegc15,

To echo @yuqiliao https://github.com/yuqiliao, thanks for this excellent report!

I just pushed a new version to this GitHub repository that I think addresses all of the above issues. You can now install it with these instructions https://github.com/American-Institutes-for-Research/EdSurvey#pre-release-installation A few notes:

the SD and summary2 functions used the population and sample standard deviations, respectively, because we tested them with surveys where the provider uses those formulas. We're deciding how to deconflict this, but for this version, they should agree. I think, at least, the default will remain so that the default will result in agreement between these two and both using the sample standard deviation. SD actually documented why the standard deviation would be off (you would have had to say what the replicate weight pre-multiplier is https://github.com/American-Institutes-for-Research/EdSurvey/blob/main/man/SD.Rd#L61-L68, which is 1/2 or PRILS), but that is the only EdSurvey function that didn't just automatically set it, which is confusing! So I updated it so it defaults to just giving you the correct result. It's unclear why we have the jkSumMultiplier argument, so I may simply remove it in a future version. Again, I really appreciate you documenting these issues so clearly!

Let us know how this works!

Best, Paul

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/American-Institutes-for-Research/EdSurvey/issues/10#issuecomment-914465413, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5KMVHZP5MZZM4GGJSHPQLUAY7FPANCNFSM5C5IPB2A. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

American-Institutes-for-Research / EdSurvey

Replicating TIMSS and PIRLS results with EdSurvey #10

Example 1

Example 2 PIRLS 2016

Issue 1: SE is slightly off compared to the ones reported in the official reports (using TIMSS data in example 1)

Issue 2: The `SD` does not return exactly the values that the `?SD` description says

Issue 3: The `SD` results are off compared to the ones reported in the official reports (using PIRLS data in example 2)

Issue 4: The `SD` does not seem to be able to generate unweighted results

Issue 5: The `summary2`'s results have the SD slightly off from the PIRLS report.

Issue 6: `edsurveyTable` does not report the s.e. of standard deviations, and `lm.sdf` does not report standard deviations.

read data

note that by default, jrrIMax is set to 1, which would give you a slightly different S.E. of the mean because it would only use 1 set of plausible values. seting `jrrIMax = Inf` ensures that all 5 PVs are used to replicate the international report

note that the code "no4" is for Norway (4), and the code "nor" is for Norway (5) in the PIRLS report. I'll use `nor` for this exercise.

The following code illustrates that changing the varMethod does not seem to change the results at all

Even if jrrIMax is set to Inf, the s.e. of StD is still off from the international report.

American-Institutes-for-Research / EdSurvey

Replicating TIMSS and PIRLS results with EdSurvey #10

Example 1

Example 2 PIRLS 2016

Issue 1: SE is slightly off compared to the ones reported in the official reports (using TIMSS data in example 1)

Issue 2: The SD does not return exactly the values that the ?SD description says

Issue 3: The SD results are off compared to the ones reported in the official reports (using PIRLS data in example 2)

Issue 4: The SD does not seem to be able to generate unweighted results

Issue 5: The summary2's results have the SD slightly off from the PIRLS report.

Issue 6: edsurveyTable does not report the s.e. of standard deviations, and lm.sdf does not report standard deviations.

read data

note that by default, jrrIMax is set to 1, which would give you a slightly different S.E. of the mean because it would only use 1 set of plausible values. seting jrrIMax = Inf ensures that all 5 PVs are used to replicate the international report

note that the code "no4" is for Norway (4), and the code "nor" is for Norway (5) in the PIRLS report. I'll use nor for this exercise.

The following code illustrates that changing the varMethod does not seem to change the results at all

Even if jrrIMax is set to Inf, the s.e. of StD is still off from the international report.

Issue 2: The `SD` does not return exactly the values that the `?SD` description says

Issue 3: The `SD` results are off compared to the ones reported in the official reports (using PIRLS data in example 2)

Issue 4: The `SD` does not seem to be able to generate unweighted results

Issue 5: The `summary2`'s results have the SD slightly off from the PIRLS report.

Issue 6: `edsurveyTable` does not report the s.e. of standard deviations, and `lm.sdf` does not report standard deviations.

note that by default, jrrIMax is set to 1, which would give you a slightly different S.E. of the mean because it would only use 1 set of plausible values. seting `jrrIMax = Inf` ensures that all 5 PVs are used to replicate the international report

note that the code "no4" is for Norway (4), and the code "nor" is for Norway (5) in the PIRLS report. I'll use `nor` for this exercise.