This PR modifies read_redcap() to let users specify raw_or_label = "haven" and have categorical fields converted to haven_labelled vectors instead of factors. I kept labelled in Suggests and added a check that the user has it installed if they specify raw_or_label = "haven".
Implementation considerations
haven_labelled vectors, unlike factors, preserve the underlying data values/types. The non-trivial part about this is that to apply the labels, the data types in the data must match the data types of the vector of values we read from the metadata. This is slightly tricky because there's no foolproof way to say "cast vector x to type of vector y" for arbitrary y. My approach was to basically do our best at casting using readr's parsing functions and fall back to converting the underlying values to chr if all else fails.
See below for a concrete example if it's helpful.
Proposed Changes
Add "haven" option to read_redcap()raw_or_label and update appropriate arg checks
Refactor multi_choice_to_labels()
Now takes a raw_or_label argument from read_redcap()
Passes off logic for how categorical fields are handled to a label_handler function which will be one of apply_labs_factor() or apply_labs_haven()
Worked Example
Suppose we have a redcap field coded like this:
value
label
3
apple
5
orange
9
banana
The db_data we get from redcap will have a field like this:
my_field
<int>
5
3
5
9
where the int datatype was actually determined by readr since we don't allow users to pass data type specifications to redcap_read_oneshot()
Description
This PR modifies
read_redcap()
to let users specifyraw_or_label = "haven"
and have categorical fields converted tohaven_labelled
vectors instead of factors. I keptlabelled
inSuggests
and added a check that the user has it installed if they specifyraw_or_label = "haven"
.Implementation considerations
haven_labelled
vectors, unlike factors, preserve the underlying data values/types. The non-trivial part about this is that to apply the labels, the data types in the data must match the data types of the vector of values we read from the metadata. This is slightly tricky because there's no foolproof way to say "cast vector x to type of vector y" for arbitrary y. My approach was to basically do our best at casting usingreadr
's parsing functions and fall back to converting the underlying values tochr
if all else fails.See below for a concrete example if it's helpful.
Proposed Changes
"haven"
option toread_redcap()
raw_or_label
and update appropriate arg checksmulti_choice_to_labels()
raw_or_label
argument fromread_redcap()
label_handler
function which will be one ofapply_labs_factor()
orapply_labs_haven()
Worked Example
The
db_data
we get from redcap will have a field like this:where the
int
datatype was actually determined byreadr
since we don't allow users to pass data type specifications toredcap_read_oneshot()
The
db_metadata
will containwhere the values are stored as
chr
rather thanint
.My implementation:
db_data$my_field
and finds it'sint
db_metadata
,c(apple = "3", orange = "5", banana = "9")
, toint
withreadr::parse_integer()
labelled::set_value_labels()
If step 2 had failed for some reason then it would cast
db_data$my_field
tochr
and apply the labels to that.Issue Addressed
Related to #178
PR Checklist
Before submitting this PR, please check and verify below that the submission meets the below criteria:
.RDS
) updated underinst/testdata/create_test_data.R
usethis::use_version()
Code Review
This section to be used by the reviewer and developers during Code Review after PR submission
Code Review Checklist