Closed nirmalghimire closed 1 year ago
# initialize an empty list to store data frames
df_list <- list()
for(i in 1:length(eds_pisa)) {
# convert each edsurvey data frame list to a data frame
df <- EdSurvey::edsurvey.data.frame(eds_pisa[[i]], pvvars = c("read"))
# store each data frame in the list
df_list[[i]] <- df
}
# bind all data frames in the list into a single data frame
compiled_data <- do.call(rbind, df_list)
[edited by PV to put code in code block.]
Greetings @nirmalghimire!
Thanks for your inquiry. The main issue you are running into is that in the code you provided, you appear to be trying to rebuild edsurvey.data.frames
when they are already built within the edsurvey.data.frame.list
generated from the readPISA
function. I think once you better understand the edsurvey.data.frame.list
object that will help with the issues you are having. Additional details can be found in the EdSurvey User Guide here.
From using the countries = "*"
argument for the readPISA
call, it will return an edsurvey.data.frame.list
object which contains all the countries already as edsurvey.data.frame
objects within it:
eds_pisa <- EdSurvey::readPISA(path = "path/PISA/2018", database = "INT", countries = "*", cognitive = "score", verbose = FALSE)
edsurvey.data.frame.list
objects are a list of two components:
1) datalist
which is a list that has all of the edsurvey.data.frame
objects within it (80 in the case of PISA 2018).
2) covs
which is a data.frame containing the covariates of the list items in the datalist
for this edsurvey.data.frame.list
object.
All EdSurvey analysis functions work with edsurvey.data.frame.lists
and will return a list of the result objects.
Using summary2 function for example passing it the edsurvey.data.frame.list
object directly (easiest method):
summaryRes <- summary2(data = eds_pisa, variable = "read")
names(summaryRes) <- eds_pisa$covs$country #name the result items by country
#print the result to console
summaryRes
#remove results that had an error/no data (Vietnam in this instance)
summaryRes$VIETNAM <- NULL
#extract just the summary data.frame from result list
summaryListDF <- lapply(summaryRes, function(x){
x$summary
})
summaryStacked <- do.call(rbind, summaryListDF)
summaryStacked$Country <- names(summaryRes)
View(summaryStacked)
If you wish to have more fine-grain control you can loop through the edsurvey.data.frame.list
item-by-item as demonstrated below. It is more complex to do so, but allows for the most user control.
resList <- vector("list", length = length(eds_pisa$datalist))
summaryStacked <- NULL
for(i in seq_along(eds_pisa$datalist)){
esdf <- eds_pisa$datalist[[i]] #grab one edsurvey.data.frame at a time
cntry <- eds_pisa$covs$country[[i]] #grab the country name from the covariates
tryCatch({summaryRes <- summary2(data = esdf, variable = "read")},
error = function(e){
message(paste0(cntry, " skipped. Error: ", e))
summaryRes <- NULL
})
if(is.null(summaryRes)){
next
}
summaryDF <- summaryRes$summary
summaryDF$Country <- cntry
summaryStacked <- rbind(summaryStacked, summaryDF)
}
View(summaryStacked)
Also, related to your inquiry we have experienced very bag lag/slowness/crashing when dealing with the full PISA 2018 dataset when using RStudio. We are still investigating but a workaround would be to use another IDE other than RStudio (e.g., RGui, or VSCode), or we had success with the 'Electron' preview of RStudio.
This looks resolved to me. @nirmalghimire let us know if you have any other questions
Thanks for using EdSurvey! Please follow the instructions below when requesting a new feature in EdSurvey.
Is your feature request related to a problem? Please describe. Currently, when using the countries = "*" function in EdSurvey to analyze PISA 2018 data, it returns a country-wise data set. However, I have a need to obtain a compiled data set instead.
Describe the solution you'd like I would appreciate it if a new functionality could be added to the EdSurvey package that allows users to easily compile data from multiple countries. This would enable us to combine the data sets and perform analysis on the combined data. Specifically, I would like to be able to use the fascinating functions available in the package after combining the datasets.
Describe alternatives you've considered I have attempted to use the do.call(rbind, ()) function to combine the data sets obtained for different countries. However, this approach did not work on the edsurvey.dataframe objects. Therefore, I am seeking an alternative solution or feature within the EdSurvey package itself to achieve the desired data compilation.
Additional context Here's my code snippet: eds_pisa <- EdSurvey::readPISA(path = "path/PISA/2018", database = "INT", countries = "*", cognitive = "score", verbose = FALSE)