CHOP-CGTInformatics / REDCapTidieR

Makes it easy to read REDCap Projects into R
https://chop-cgtinformatics.github.io/REDCapTidieR/
Other
32 stars 8 forks source link

[FEATURE] Multiple Choice to Single Column Function #194

Closed rsh52 closed 1 month ago

rsh52 commented 3 months ago

Feature Request Description

It can be a common use case that multiple choice fields (i.e. checkbox fields) need to be consolidated into a single column, such as when powering Table 1s for manuscript reporting.

Proposed Solution

This function, placeholder name reduce_multi_to_single_column() will be the first REDCapTidieR analytic tool that users can implement on columns in their extracted tibbles.

It should:

Additional Context

This was prompted by the request in #192 and should be a more generalizable solution.

Checklist

ezraporter commented 2 months ago

The stuff in https://github.com/CHOP-CGTInformatics/REDCapTidieR/commit/96c3309959caa5d574ae192604ce86ceef04bea1 looks like a good start. I didn't look closely at the code and mainly focused on the API.

My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing.

I think the best way of doing this is to have this function modify a tibble within the supertibble and return the whole thing. To get the current behavior you'd do need to do something like:

supertbl |>
  reduce_multi_to_single_column() |>
  extract_tibble()

but this buys us composability:

supertbl |>
  reduce_multi_to_single_column() |>
  reduce_multi_to_single_column() |>
  some_other_transformation() |>
  ...

Naming thoughts

Maybe we should call this unite_checkbox() in reference to tidyr::unite()?

For parameter names what about: cols_to -> values_to no_val -> values_fill multi_val -> multi_value_label

The first 2 are inspired by pivot_* naming conventions in tidyr

skadauke commented 2 months ago

Hi,

There isn't a general rule that says that the pipe should always have inputs and outputs that are the same type of thing. That's more of a dplyr-specific rule that's not shared by other tidy packages such as e.g. tidymodels.

That said, I agree with your point and actually think this function should take a tibble and return a modified tibble. So the workflow would be

supertbl |> extract_tibble(x) |> reduce_multi_to_single_column(...)

I think "unite" is much less specific than "reduce_multi_to_single_column".

My thoughts! S


From: Ezra Porter @.> Sent: Friday, July 12, 2024 2:43 PM To: CHOP-CGTInformatics/REDCapTidieR @.> Cc: Subscribed @.***> Subject: [External]Re: [CHOP-CGTInformatics/REDCapTidieR] [FEATURE] Multiple Choice to Single Column Function (Issue #194)

The stuff in 96c3309https://github.com/CHOP-CGTInformatics/REDCapTidieR/commit/96c3309959caa5d574ae192604ce86ceef04bea1 looks like a good start. I didn't look closely at the code and mainly focused on the API.

My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing.

I think the best way of doing this is to have this function modify a tibble within the supertibble and return the whole thing. To get the current behavior you'd do need to do something like:

supertbl |> reduce_multi_to_single_column() |> extract_tibble()

but this buys us composability:

supertbl |> reduce_multi_to_single_column() |> reduce_multi_to_single_column() |> some_other_transformation() |> ...

Naming thoughts

Maybe we should call this unite_checkbox() in reference to tidyr::unite()?

For parameter names what about: cols_to -> values_to no_val -> values_fill multi_val -> multi_value_label

The first 2 are inspired by pivot_* naming conventions in tidyr

— Reply to this email directly, view it on GitHubhttps://github.com/CHOP-CGTInformatics/REDCapTidieR/issues/194#issuecomment-2226171490, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACTGHWXWPZPXGODN5JDCPR3ZMAPVTAVCNFSM6AAAAABIRVH3YGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWGE3TCNBZGA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

rsh52 commented 2 months ago

My biggest reaction is that the current API take a supertibble as input and returns a tibble and I'm not sure that's right. I think we want data manipulation functions like this to be pipe-friendly which requires that inputs and outputs are the same type of thing.

For the API change, I think this is easily reworkable if we agree on what it should intake and output. I definitely see what you're saying. My concern with returning a supertibble is that you don't actually see the changes from the output of the function, it's sort of "masked" inside of the data tibbles. But maybe that's not a big issue here?

Either way the function needs access to the metadata raw/label values associated with the checkboxes to be united, so I don't see much way of not having users supply the supertibble. Otherwise they'd have to supply the data tibble and metadata tibble separately.

Naming thoughts

I like the naming much better, these all make sense to me.

rsh52 commented 2 months ago

That said, I agree with your point and actually think this function should take a tibble and return a modified tibble. So the workflow would be supertbl |> extract_tibble(x) |> reduce_multi_to_single_column(...)

@skadauke This also makes some sense, but then users would still need to supply the metadata separately. That's what led me to wrapping the extract_() internally.

skadauke commented 2 months ago

What metadata is needed for the transformation?


From: Rich Hanna @.> Sent: Friday, July 12, 2024 2:56 PM To: CHOP-CGTInformatics/REDCapTidieR @.> Cc: Stephan Kadauke @.>; Mention @.> Subject: [External]Re: [CHOP-CGTInformatics/REDCapTidieR] [FEATURE] Multiple Choice to Single Column Function (Issue #194)

That said, I agree with your point and actually think this function should take a tibble and return a modified tibble. So the workflow would be supertbl |> extract_tibble(x) |> reduce_multi_to_single_column(...)

@skadaukehttps://github.com/skadauke This also makes some sense, but then users would still need to supply the metadata separately. That's what led me to wrapping the extract_() internally.

— Reply to this email directly, view it on GitHubhttps://github.com/CHOP-CGTInformatics/REDCapTidieR/issues/194#issuecomment-2226187848, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACTGHWTM263RHLQ7ROGOLXTZMARFFAVCNFSM6AAAAABIRVH3YGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWGE4DOOBUHA. You are receiving this because you were mentioned.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

rsh52 commented 2 months ago

What metadata is needed for the transformation?

The general rule is to consolidate checkboxes under one column, showing the raw/label value associated with the checkbox if only one value is selected OR a custom value (i.e. "multiple" / "many") if multiple selected. There's no way to grab these values in the data tibble, they are either 1s and 0s or TRUEs and FALSEs. Ex:

> nonrepeat_data
# A tibble: 3 × 4
  study_id multi___1 multi___2 multi___3
     <dbl> <lgl>     <lgl>     <lgl>    
1        1 TRUE      FALSE     FALSE    
2        2 TRUE      TRUE      FALSE    
3        3 FALSE     FALSE     FALSE    
> nonrepeat_metadata
# A tibble: 4 × 2
  field_name select_choices_or_calculations
  <chr>      <chr>                         
1 study_id   NA                            
2 multi___1  1, Red | 2, Yellow | 3, Blue  
3 multi___2  1, Red | 2, Yellow | 3, Blue  
4 multi___3  1, Red | 2, Yellow | 3, Blue