brad-cannell / codebookr

Create Codebooks From Data Frames
https://brad-cannell.github.io/codebookr/
Other
26 stars 7 forks source link

Manually creating value labels #16

Closed mbcann01 closed 2 years ago

mbcann01 commented 2 years ago

While working on #13, I realized that I didn't have an example in README for manually adding value labels. While working on that example, I decided to make some changes to the code in cb_get_col_attributes().

mbcann01 commented 2 years ago

Have cb_get_col_attributes check to make sure the values in value labels match the values in the column

I considered doing a check to make sure that the values in value_labels matched the values in the data frame column. However, it's possible that a user may want to include values that are theoretically possible in the value_labels that don't actually exist in the column.

So, then I considered just checking to make sure that all unique column values appear somewhere in the value_labels. This should help check for typos (e.g., typed 77 instead of 7) and accidentally forgetting to include a value label for a category. However, there are some scales that are partial. Think of the DETECT observational scale that is 1 = "Really bad" 2 3 4 = "Really good"

I suppose in that situation, the value label could just be: 1 = "Really bad" 2 = "2" 3 = "3" 4 = "Really good"

Again, this still helps check for typos (e.g., typed 77 instead of 7) and accidentally forgetting to include a value label for a category.

mbcann01 commented 2 years ago

Drop check to make sure the values in value labels match the values in the column

While working on creating a codebook for L2C, I ran into an annoying issue with several variables (e.g., sq_7b, height, weight, waist_c). For example:

sq_7b asks, "Ask individual if they have a study flyer from the Dallas County Jail? If they have a flyer insert number below, otherwise click not applicable." The values 1 - 2000 = ticket number, but 9999 = Not Applicable. Therefore, Visit 1 is imported with the labels attribute "Not Applicable 9999". However, there are no actual 9999 values in sq_7b. There are only the values (as of 2022-07-03) 52 and 74. This causes codebook to throw an error about all unique nonmissing values for a column to be included in the value_labels attribute if the value_labels attribute exists.

At first, I was going to drop the value labels attribute to fix the error. But, this is kind of useful information to keep just in case. It was also kind of annoying having to run codebook, she which column caused this error, drop the label attribute for that column, repeat...

I decided to just drop this check.