Public-Health-Scotland / AAA-KPIs

2 stars 0 forks source link

Derived Variables #3

Closed Angus-Morton closed 1 year ago

Angus-Morton commented 2 years ago

Ideally all derived variables needed by any of the subsequent scripts should be created in the initial AAA_processing script.

The rationale being that this means everything in the initial download from ATOS can be checked by the checking scripts and issues can be flagged up and sent back to them quickly.

Add comments below with code snippets of derived variables from other scripts. Along with the name of the script they appear in.

Angus-Morton commented 2 years ago

In AAA-KPIs 1_processing_for_KPI_11_13.R

`# Export very large measurements as they might be errors

# Derive measurements for the PHS screen result categories
# A measurement category is derived for definitive screen results i.e. positive,
# negative, external postive or external negative results unless the follow up
# recommendation is immediate recall ('05').
# This means a measurement category is not derived for technical fails, non
# visualisations and immediate recalls.
last_results_initial_screens <- last_results_initial_screens %>%
  mutate(isd_aaa_size = case_when(screen_result %in% c("01", "02", "05", "06") &
                                    (followup_recom != "05" |
                                       is.na(followup_recom)) ~ largest_measure)) %>%
  mutate(isd_aaa_size_group = case_when(isd_aaa_size >= 0 &
                                          isd_aaa_size <= 2.9 ~ "negative",
                                        isd_aaa_size >= 3 &
                                          isd_aaa_size <= 4.4 ~ "small",
                                        isd_aaa_size >= 4.5 &
                                          isd_aaa_size <= 5.4 ~ "medium",
                                        isd_aaa_size >= 5.5 &
                                          isd_aaa_size <= 10.5 ~ "large",
                                        isd_aaa_size >= 10.6 ~
                                          "very large error"))

# Assume these have been investigated by the checking script

last_results_initial_screens <- last_results_initial_screens %>%
  mutate(isd_aaa_size_group = recode(isd_aaa_size_group,
                                     "very large error" = "large"))`
karen-hotopp commented 2 years ago

From script 4.1_vascular_outcomes.R

`# categorize largest measurement into two bins mutate(result_size = if_else(largest_measure >= 5.5, 1, 2)) %>%

remove first mutate (as.character) and add 0s to single digits below after fixed in script 1

mutate(result_outcome = as.character(result_outcome), outcome_type = case_when(result_outcome %in% c('1','2','3','4','5','6','7','8', '11','12','13','15','16','20', '21') ~ 1, result_outcome %in% c('9','10','14','17', '18','19') ~ 2, is.na(result_outcome) ~ 3, TRUE ~ 4)) %>%

`

But also not 100% convinced that this should be moved... are there any other script that use result_size or outcome_type?

karen-hotopp commented 2 years ago

I've added this into the extract script, so they will be produced at the beginning. Not sure about the check though... should this get sent to HB for review if "very large"? It looks from the little bit of code that it is recoded as "large"...?

calumpurdie commented 1 year ago

I derive the age at screening using phsmethods age_calculate() function

age_at_screening = age_calculate(dob, date_screen)

Angus-Morton commented 1 year ago

A few variables in here:

cohort1 <- cohort1 %>% mutate(eligibility_period = case_when( between(dob, dmy("01-04-1947"), dmy("31-03-1948")) ~ "Turned 66 in year 201314", between(dob, dmy("01-04-1948"), dmy("31-03-1949")) ~ "Turned 66 in year 201415", between(dob, dmy("01-04-1949"), dmy("31-03-1950")) ~ "Turned 66 in year 201516", between(dob, dmy("01-04-1950"), dmy("31-03-1951")) ~ "Turned 66 in year 201617", between(dob, dmy("01-04-1951"), dmy("31-03-1952")) ~ "Turned 66 in year 201718", between(dob, dmy("01-04-1952"), dmy("31-03-1953")) ~ "Turned 66 in year 201819", between(dob, dmy("01-04-1953"), dmy("31-03-1954")) ~ "Turned 66 in year 201920", between(dob, dmy("01-04-1954"), dmy("31-03-1955")) ~ "Turned 66 in year 202021", between(dob, dmy("01-04-1955"), dmy("31-03-1956")) ~ "Turned 66 in year 202122", between(dob, dmy("01-04-1956"), dmy("31-03-1957")) ~ "Turned 66 in year 202223", between(dob, dmy("01-04-1957"), dmy("31-03-1958")) ~ "Turned 66 in year 202324" ), age65_onstartdate = case_when( hbres == "Ayrshire & Arran" & between(dob, dmy("01-06-1947"), dmy("31-05-1948")) ~ 1, hbres == "Borders" & between(dob, dmy("09-08-1946"), dmy("08-08-1947")) ~ 1, hbres == "Dumfries & Galloway" & between(dob, dmy("24-07-1947"), dmy("23-07-1948")) ~ 1, hbres == "Fife" & between(dob, dmy("09-01-1947"), dmy("08-01-1948")) ~ 1, hbres == "Forth Valley" & between(dob, dmy("18-09-1947"), dmy("17-09-1948")) ~ 1, hbres == "Grampian" & between(dob, dmy("03-10-1946"), dmy("02-10-1947")) ~ 1, hbres == "Greater Glasgow & Clyde" & between(dob, dmy("06-02-1947"), dmy("05-02-1948")) ~ 1, hbres == "Highland" & between(dob, dmy("29-06-1946"), dmy("28-06-1947")) ~ 1, hbres == "Lanarkshire" & between(dob, dmy("01-04-1947"), dmy("31-03-1948")) ~ 1, hbres == "Lothian" & between(dob, dmy("09-08-1946"), dmy("08-08-1947")) ~ 1, hbres == "Orkney" & between(dob, dmy("03-10-1946"), dmy("02-10-1947")) ~ 1, hbres == "Shetland" & between(dob, dmy("03-10-1946"), dmy("02-10-1947")) ~ 1, hbres == "Tayside" & between(dob, dmy("09-01-1947"), dmy("08-01-1948")) ~ 1, hbres == "Western Isles" & between(dob, dmy("29-06-1946"), dmy("28-06-1947")) ~ 1, TRUE ~ 0 ), over65_onstartdate = case_when( hbres == "Ayrshire & Arran" & dob < dmy("01-06-1947") ~ 1, hbres == "Borders" & dob < dmy("09-08-1946") ~ 1, hbres == "Dumfries & Galloway" & dob < dmy("24-07-1947") ~ 1, hbres == "Fife" & dob < dmy("09-01-1947") ~ 1, hbres == "Forth Valley" & dob < dmy("18-09-1947") ~ 1, hbres == "Grampian" & dob < dmy("03-10-1946") ~ 1, hbres == "Greater Glasgow & Clyde" & dob < dmy("06-02-1947") ~ 1, hbres == "Highland" & dob < dmy("29-06-1946") ~ 1, hbres == "Lanarkshire" & dob < dmy("01-04-1947") ~ 1, hbres == "Lothian" & dob < dmy("09-08-1946") ~ 1, hbres == "Orkney" & dob < dmy("03-10-1946") ~ 1, hbres == "Shetland" & dob < dmy("03-10-1946") ~ 1, hbres == "Tayside" & dob < dmy("09-01-1947") ~ 1, hbres == "Western Isles" & dob < dmy("29-06-1946") ~ 1, TRUE ~ 0 ), dob_eligibility = case_when( over65_onstartdate == 1 ~ "Over eligible age cohort - age 66plus on start date", age65_onstartdate == 1 ~ "Older cohort - age 65 on start date", !is.na(eligibility_period) & age65_onstartdate == 0 ~ eligibility_period ))

calumpurdie commented 1 year ago

I've came across two scripts (row 15 and 27 in the spreadsheet) which require financial year based on date_surgery, so it probably makes sense to calculate this as a derived variable if it isn't already.

karen-hotopp commented 1 year ago

Variables moved to quarterly extracts: