American-Institutes-for-Research / EdSurvey

https://american-institutes-for-research.github.io/EdSurvey/
GNU General Public License v2.0
9 stars 8 forks source link

Recode/Regroup achievement levels for Regression Analysis #46

Closed lostandfound8 closed 2 years ago

lostandfound8 commented 2 years ago

1) Context

I am a student from Germany currently working on my master thesis in educational science and therefore working with PIAAC datasets from various countries using EdSurvey in R-Studio. (Also I am a beginner in R and on GitHub and hope this is a correct place of contact for my inquiry.)

2) Aims

A) I would like to do a multivariate regression (mvrlm.sdf) using the six achievement levels (i.e. of Literacy, but basically for each competence individually) as my dependent variable, but recoding/grouping them to only three achievement levels (i.e. levels 0 and 1 as "-" or 1, levels 2 and 3 as average or 2 and levels 4 and 5 as "+" or 3). (--> competencies: Lit (Literacy), Num (Numeracy), PSL / ICT (Problem Solving in Technology-rich environments using Information and Communication Technology))

B) Secondly I would like to do a Linear Regression (unsure whether glm.sdf or lm.sdf?) within only the average achievement level to examine the Variance of the Predictors within this level.

3) Problem and considered alternatives

A) Since the Competencies are no singular Variable but include the ten Plausible Values (PVs) and weights, I am unsure of how best to recode them without unintentionally not taking this into account (i.e. with the recode-function of EdSurvey or Car-package). What I have read so far indicates that the recode-function of EdSurvey is only meant for discrete variables and recoding label-names, yet not the labels themselves, Cut-off-points or the like).

B) Depending on whether the achievement levels can be recoded, this determines the formula for the Linear Regression and how to "extract" specific achievement levels. Trying doing so via subset-function did not work due to competencies not being singular variables (i.e. "sdf_allM_Lit <- subset(sdf_all, (lit > 226 & lit < 325))")

4) desired solutions

A) At best would be a code for recoding achievement levels considering the PVs and weights, that I can then use as dependent variables for the multivariate regression B) and later on as a subset (?) for a linear regression.

5) Additional Info If required I can upload my current R-Script or answer further questions regarding the thesis.

pdbailey0 commented 2 years ago

I’m not entirely sure I understand the question, or maybe questions l, but have you looked at achievementLevels function, which lets you set the cut points with an argument similar to that.

I hope you do not need to recode variables with that.

Linear regression in R is done without lm, in EdSurvey wirh lm.sdf. I hope this helps.

Best, Paul


From: lostandfound8 @.> Sent: Wednesday, July 13, 2022 12:07:50 PM To: American-Institutes-for-Research/EdSurvey @.> Cc: Subscribed @.***> Subject: [American-Institutes-for-Research/EdSurvey] Recode/Regroup achievement levels for Regression Analysis (Issue #46)

External email alert: Be wary of links & attachments.

1) Context

I am a student from Germany currently working on my master thesis in educational science and therefore working with PIAAC datasets from various countries using EdSurvey in R-Studio. (Also I am a beginner in R and on GitHub and hope this is a correct place of contact for my inquiry.)

2) Aims

A) I would like to do a multivariate regression (mvrlm.sdf) using the six achievement levels (i.e. of Literacy, but basically for each competence individually) as my dependent variable, but recoding/grouping them to only three achievement levels (i.e. levels 0 and 1 as "-" or 1, levels 2 and 3 as average or 2 and levels 4 and 5 as "+" or 3). (--> competencies: Lit (Literacy), Num (Numeracy), PSL / ICT (Problem Solving in Technology-rich environments using Information and Communication Technology))

B) Secondly I would like to do a Linear Regression (unsure whether glm.sdf or lm.sdf?) within only the average achievement level to examine the Variance of the Predictors within this level.

3) Problem and considered alternatives

A) Since the Competencies are no singular Variable but include the ten Plausible Values (PVs) and weights, I am unsure of how best to recode them without unintentionally not taking this into account (i.e. with the recode-function of EdSurvey or Car-package). What I have read so far indicates that the recode-function of EdSurvey is only meant for discrete variables and recoding label-names, yet not the labels themselves, Cut-off-points or the like).

B) Depending on whether the achievement levels can be recoded, this determines the formula for the Linear Regression and how to "extract" specific achievement levels. Trying doing so via subset-function did not work due to competencies not being singular variables (i.e. "sdf_allM_Lit <- subset(sdf_all, (lit > 226 & lit < 325))")

4) desired solutions

A) At best would be a code for recoding achievement levels considering the PVs and weights, that I can then use as dependent variables for the multivariate regression B) and later on as a subset (?) for a linear regression.

5) Additional Info If required I can upload my current R-Script or answer further questions regarding the thesis.

— Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAmerican-Institutes-for-Research%2FEdSurvey%2Fissues%2F46&data=05%7C01%7Cpbailey%40air.org%7C91a56c7486e14e669f8208da64b78b57%7C9ea45dbc7b724abfa77cc770a0a8b962%7C0%7C0%7C637933036735842061%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ns%2FkmrQoAe0hhWD6FUJHYh%2B7yd5Rlj167udJDX3ifxc%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADPGNRDWIOXE2HMOXLOSKA3VT2IPNANCNFSM53OES2MA&data=05%7C01%7Cpbailey%40air.org%7C91a56c7486e14e669f8208da64b78b57%7C9ea45dbc7b724abfa77cc770a0a8b962%7C0%7C0%7C637933036735842061%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HDWvC4xw3ufWKBkhO0SLxmIDyHOqgCd5YANfhI1DR3I%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

lostandfound8 commented 2 years ago

UPDATE (29.08.2022): I have found a solution using getData-function.


previous comment:

Hello Paul, thank you for your fast reply!

So my main question is how I can "save" the customized CutPoints of the achievement levels in order to use these as separate dependent variables within a multivariate regression Analysis. I was able to change the CutPoints using the achievementLevels-function (to 0, 226, 326) but when checking them using the showCutPoints-function for the data, the Output showed the datas default CutPoints (176, 226, 276, 326, 376). I also don't seem to be able to fit achievement levels as dependent variables into the formula without receiving warning messages or generating an output with only intercept 0.