CHOP-CGTInformatics / REDCapTidieR

Makes it easy to read REDCap Projects into R
https://chop-cgtinformatics.github.io/REDCapTidieR/
Other
32 stars 8 forks source link

Add `haven` option to `raw_or_label` #180

Closed ezraporter closed 6 months ago

ezraporter commented 6 months ago

Description

This PR modifies read_redcap() to let users specify raw_or_label = "haven" and have categorical fields converted to haven_labelled vectors instead of factors. I kept labelled in Suggests and added a check that the user has it installed if they specify raw_or_label = "haven".

Implementation considerations

haven_labelled vectors, unlike factors, preserve the underlying data values/types. The non-trivial part about this is that to apply the labels, the data types in the data must match the data types of the vector of values we read from the metadata. This is slightly tricky because there's no foolproof way to say "cast vector x to type of vector y" for arbitrary y. My approach was to basically do our best at casting using readr's parsing functions and fall back to converting the underlying values to chr if all else fails.

See below for a concrete example if it's helpful.

Proposed Changes

Worked Example

Suppose we have a redcap field coded like this: value label
3 apple
5 orange
9 banana

The db_data we get from redcap will have a field like this:

 my_field
    <int>
        5
        3
        5
        9

where the int datatype was actually determined by readr since we don't allow users to pass data type specifications to redcap_read_oneshot()

The db_metadata will contain

 field_name   select_choices_or_calculations         
 <chr>        <chr>                                  
 my_field     3, apple | 5, orange | 9, banana

where the values are stored as chr rather than int.

My implementation:

  1. Checks the data type of db_data$my_field and finds it's int
  2. Casts the labels read from db_metadata, c(apple = "3", orange = "5", banana = "9"), to int with readr::parse_integer()
  3. Applies the labels with labelled::set_value_labels()

If step 2 had failed for some reason then it would cast db_data$my_field to chr and apply the labels to that.

Issue Addressed

Related to #178

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist