BornInBradford / aow_datarelease

Tools and scripts for production of Age of Wonder data releases
MIT License
0 stars 0 forks source link

RCADS responses for year 10 incorrectly coded #24

Open jpickavance opened 2 months ago

jpickavance commented 2 months ago

Original email: I’m just pulling some descriptives together quickly and I noticed there’s an error in how the RCADS items (awb2_1_illhealth_1: awb2_1_illhealth_25) have been coded calculated for 2023-24. Currently the code -4 has been assigned to all of those in year 9 and 10 with the label [only shown to those in year 8]. In fact, RCADS was also taken for year 10 so it should only be coded as -4 for those in year 9, perhaps with the note [not shown to those in year 9].

Dan's response: The reason for this is we only had this happen in Y10 for some vars originally so the following code worked:

year_group = case_when(grepl("year_group", branching) ~ str_extract(branching, "[\d]+")

It just looks for the string “year_group” in the branching field, when it finds this it takes the first number it finds and puts it aside, then later creates a category label from this saying “only shown in year group x”. In the case of the RCADS variables, branching contains:

            [year_group] = '8' or [year_group] = '10'

So this code no longer works as it doesn’t find the 10.

Couple of options:

• Give up on automating this and make the changes manually – we might miss some? • Change the label so it just says “not shown in all year groups” – probably still gives the user the missingness info they need? • Update the regex in the str_extract function above (or rewrite entirely) so it returns all separate integers as a string, comma separated or something along those lines, then the label should just work as expected – I’m not sure I have the stomach to attempt this one but if you’d like a go… 😊

(Option 2 seems best)

datadm commented 2 months ago

On second thoughts these ideas don’t work anyway. Later in the pipeline it adds the missing value code only for those rows where year_group==x. But in this case, even if we change the label in one of the ways indicated we still only get the missing value code of -4 where year_group==8.

Consider giving up on the idea of adding missingness indicators based on year group branching logic and look into a way to get the branching logic embedded in the variable description in the data dictionary instead.