American-Institutes-for-Research / EdSurvey

https://american-institutes-for-research.github.io/EdSurvey/
GNU General Public License v2.0
9 stars 8 forks source link

achievementLevels by a continuous grouping variable #49

Closed ainhoavega closed 1 year ago

ainhoavega commented 1 year ago

If I understand it correctly, achievementLevels calculates achievementLevels by groups (e.g. by sex male/female). However, the “grouping” variable that I’d like to use is not categorical, but continuous (it’s the ESCS index in PISA).

Therefore, first I have to “transform” it from continuous to categorical, using the percentile function (I’d create 5 groups based on quintiles) and then I’d be able to pass it on and use it as a grouping variable in achievementLevels.

This would allow me to calculate, basically, the % of students who achieve a specific level and fall in the lower quintile of the ESCS, vs. the % of students who achieve that same specific PISA level but fall in the upper quintile of the ESCS.

The workaround I’m using so far is subsetting data based on the quintiles calculated using the percentile function, but it’s not very elegant/efficient. So I was wondering whether what I was thinking is possible or not using the current achievementLevels and percentile functions. Thanks!

pdbailey0 commented 1 year ago

I would approach this by making a new factor variable that has the ESCS bin. The rest of this shows code for this

You must already have done this, but for completeness, I'm showing downloading the data

downloadPISA("~/", years=2015)

then I read in data for one country and calculate the percentile. Here I do three levels.

 ita <- readPISA("~/PISA/2015", countries="ita")
 percentile(data=ita, "escs", c(33, 67))

This gives my percentiles of 33rd at -0.5058 and 67th at 0.3907 then I create the bin as a character

# make a low bin
ita$escsBin <- ifelse(ita$escs < -0.5058, "low", "other")
# make mid bin, keeping the low bin
ita$escsBin <- ifelse(ita$escs >= -0.5058 & ita$escs < 0.3907, "mid", ita$escsBin)
# make the high bin, keeping the other two bins 
ita$escsBin <- ifelse(ita$escs > 0.3907, "high", ita$escsBin)

I then made some summaries to confirm this is correct (not shown).

then I made the escsbinf which is a factor. This allows me both to use achievementLevels and puts the bins in order.

# now make a factor variable
ita$escsbinf <- factor(ita$escsBin, c("low", "mid", "high"))
# use that for the achievementLevel
achievementLevels("read", "escsbinf", data=ita)

I hope this answers your question.

ainhoavega commented 1 year ago

Hi, first of all, thank you! It does answer my question, for a single country. However, when I try to apply this to different countries (each of them with their own cutoff points) it seems to not work, I believe, because the object class is edsurvey.data.frame.list and not edsurvey.data.frame. Forgive me if this is obvious, but I'm new to this package and I haven't been able to find a lot of information on how to do descriptives for more than one country at a time. Thanks!

pdbailey0 commented 1 year ago

@ainhoavega we don't seem to cover this in our book, but generally, we manipulate individual country-year files and then put them together into an edsurvey.data.frame.list right before analysis.

We talk about this in more detail on pages 27-29 of this document which is about TIMSS but it applies just as well to your situation.

ainhoavega commented 1 year ago

Thanks, just one last question – when using the aggregateBy argument of achievementLevels, is there a way to obtain total achievement levels from the same function call or do I have to call it twice, once without aggregateBy? E.g. if you’re aggregating by sex, get achievement levels for male, female, and total.

El 3 feb 2023, a las 14:03, Paul Bailey @.***> escribió:

@ainhoavega https://github.com/ainhoavega we don't seem to cover this in our book, but generally, we manipulate individual country-year files and then put them together into an edsurvey.data.frame.list right before analysis.

We talk about this in more detail on pages 27-29 of this document https://www.air.org/sites/default/files/edsurvey-TIMSS-pdf.pdf which is about TIMSS but it applies just as well to your situation.

— Reply to this email directly, view it on GitHub https://github.com/American-Institutes-for-Research/EdSurvey/issues/49#issuecomment-1415842741, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6D5BC7NCZMAKNOQSEVQA3WVT62NANCNFSM6AAAAAATZVVRHU. You are receiving this because you were mentioned.

pdbailey0 commented 1 year ago

@ainhoavega you'll need to call it twice.