GEMINI-Medicine / Rgemini

A custom R package that provides a variety of functions to perform data analyses with GEMINI data
https://gemini-medicine.github.io/Rgemini/
Other
3 stars 0 forks source link

Make gender as a binary variable #90

Closed shijiaSMH closed 5 months ago

shijiaSMH commented 6 months ago

CIHI gender variable has 4 levels.

However, U & O are usually too few for any meaningful estimates, and sex is most commonly used in literature. Thus, it's good to have a flexible function to make gender to represent binary sex.

di <- function(sex, data) {

### sex
# collapse as: F vs not F; M vs not M
  if (sex=='female') {
    data[, sex := 'Not female']
    data[gender=='F', sex := 'Female']

  }

  if (sex=='male') {
    data[, sex := 'Not male']
    data[gender=='M', sex := 'Male']

  }
loffleraSMH commented 5 months ago

Thanks @shijiaSMH for bringing this up! My personal view is that anything that can be expressed in a simple, single line of code does not require functionalization (especially if it's only used once per analysis code - which is the case here). E.g., this would be the only line of code users need so I'm not sure any further functionalization is needed:

data[, sex := ifelse(gender == "M", "Male", "Not male")]

If this is more about documentation on how to treat the gender variable (and that it should be interpreted as sex), I think it might be better to add this to the variable discussion board instead.

@vaakesan-SMH: Any thoughts on this?

shijiaSMH commented 5 months ago

Yes @loffleraSMH, this makes sense, but pls allow me to add some intricacies here.

Mainly, it's the best to have sth to keep it flexible: Male/not male, Female/not female. We might want to do Female/not female if paper's interest in on vulnerable group. Having this flexibility allows the change easier in manuscript reviewing stage.

I do agree this can have a much simpler implementation though.

vaakesan-SMH commented 5 months ago

Agree with @loffleraSMH.

Might be worth adding to the variable discussion board or raising with the group though @shijiaSMH because even if sex is more commonly used in literature, I believe there is a difference between CIHI's "reported sex or or gender" data element and biological sex. From what I understand this value is taken from the health card.

In other words I don't believe this proposed function modifies CIHI's "gender" field to represent binary sex? Unless I am misunderstanding the intention.

loffleraSMH commented 5 months ago

Yes @loffleraSMH, this makes sense, but pls allow me to add some intricacies here.

Mainly, it's the best to have sth to keep it flexible: Male/not male, Female/not female. We might want to do Female/not female if paper's interest in on vulnerable group. Having this flexibility allows the change easier in manuscript reviewing stage.

I do agree this can have a much simpler implementation though.

Sure but Female/not female (or any other binarization/labelling users may want to apply) can similarly be implemented with a single line of code. I'm just not convinced it's worth adding a whole separate function for this (keep in mind that users will first need to understand how the function works / review documentation). It seems to me like most users would just implement their own binarization if needed instead of calling a separate function for this. I also agree with @vaakesan-SMH's concern about the nuances of CIHI's gender field, so agree that the wording/intention is a bit unclear right now. I think the main reason we currently binarize is for convenience/to avoid low cell counts for U/O.

shijiaSMH commented 5 months ago

Thanks both, this makes sense!

Just add a bit more about CIHI code doesn't correspond to biological sex. My understanding is that: gender is the best proxy we can have for biological sex.

And yes I agree major reason is to avoid reporting U/O in manuscripts.

loffleraSMH commented 5 months ago

Ok great, so I'll go ahead and close this issue here and @shijiaSMH can write up a brief summary about this on the variable discussion board.