bcjaeger / melodem-apoe4-het

Workshop and collaborative manuscript
https://bcjaeger.github.io/melodem-apoe4-het/
Creative Commons Attribution Share Alike 4.0 International
3 stars 17 forks source link

Fix for required factor (character) variable checks #13

Closed alvinthomas closed 3 months ago

alvinthomas commented 3 months ago

There are checks for the values/format of the variables sex and apoe4 that were tripping me up. I found it easiest to make these into labelled factor variables.

This assumes the values of your data are current "female" and "male" for sex and "carrier" and "non_carrier" for apoe4. It also allows for you to have missingness in sex and apoe4. However, since we aren't allowed to have missing apoe4 values, the last step filters out those that are missing apoe4 status.

Replace zzzz with the name of your data object. This process can be part of data_prepare (since the levels are first checked there).

Example code:

zzzz <- zzzz %>%
  mutate(sex == factor(
    ifelse(sex = "female", 1, ifelse(is.na(sex), NA, 0)),
    levels = c(0,1),
    labels = c("men", "women"))) %>%
  mutate(apoe4= factor(
    ifelse(apoe4== "carrier", 1, 1, ifelse(is.na(apoe4), NA, 0)),
    levels = c(0,1),
    labels = c("non_carrier", "carrier"))) %>%
  filter(!is.na(apoe4))
alvinthomas commented 3 months ago

Closing the issue since a general solution was identified. Please re-open if this doesn't work for future data structure checks. A potential enhancement is a more flexible check of the data structure.

bcjaeger commented 3 months ago

This is great! I think we can set up a more explicit set of global exclusions going forward that will drop missings in apoe4 and time + status, then restrict analysis to adults 55 and older (or whatever cut-point works globally).