Closed m-gough closed 1 year ago
Thank you, @m-gough! I will take a look ASAP!
Hi @m-gough!
First of all, I am so sorry for taking so long to review this pull request. I'm have been absolutely swamped lately. Second, I wrote a really long, detailed comment here, impulsively clicked on Files changed
tab at the top of the screen, and then realized the the entire comment was deleted when I navigated way from this page! 🤦♂️😡 But, I will try my best to recreate it.
Overall, you a doing amazing work and I'm really grateful for your help! I do have a few comments that are meant to be constructive and not at all critical!
observational_measures_recode_factor_6.19.Rmd
might become something like data_0x_observational_recode_factors.Rmd
.output: htm_notebook
from the YAML header.dplyr
with library(tidyverse)
. There is nothing "wrong" with this method! However, it does load a bunch of other stuff that we don't always need. Therefore, my preference is generally to load each tidyverse package we need individually. And since we use dplyr
so often, I tend to always write library(dplyr, warn.conflicts = FALSE)
right off the bat in my files.observational_measures_recode_factor_6.19.Rmd
. Let's delete those, please.#### Write to CSV
on line 543 of observational_measures_recode_factor_6.19.Rmd
and the code chunk that follows it.observational_measures_recode_factor_6.19.Rmd
, for example, Data cleaning
should probably remain a level 1 header, but all of the individual variables you recode to factors below should probably be level 2 headers. They are specific instances of data cleaning. As another example, #### Write to CSV
should probably not be a level 4 header. It should probably be a level 1 header (# Write to CSV
)._f
naming conventions that we discussed in the R class. However, in the case of character variable like unusual_odor
(see observational_measures_recode_factor_6.19.Rmd
, line 60), have both a character version and a factor version doesn't do a lot of good for us. I think it's safe to go head and just convert those to factors without keeping the character version. And since there is only a factor version of the variable, there is really no need to mark it with _f
.clothes
(see observational_measures_recode_factor_6.19.Rmd
, line 87), I like that you created a numeric version and a factor version with the _f
naming convention. Great job!There is a lot of code like this:
self_report <- self_report %>%
mutate(
across(
.cols = starts_with("neglect") & !ends_with("help") &
!ends_with("reliable") & !ends_with("person"),
.fns = ~ case_when(
.x == "Yes" ~ 1,
.x == "No" ~ 2,
.x == "Don't know" ~ 7,
.x == "Refused" ~ 9
),
.names = "{col}_4cat"
),
across(
.cols = starts_with("neglect") & !ends_with("help") &
!ends_with("reliable") & !ends_with("person") & !ends_with("4cat"),
.fn = ~ factor(.x, levels = levels_yes_no),
.names = "{col}_4cat_f"
)
)
Whichi is great code! But the column selection starts_with("neglect") & !ends_with("help") & !ends_with("reliable") & !ends_with("person")
is pretty complex.
I think that's all I've got for now. Please let me know if you have any questions.
Thank you so much for the feedback! I made the changes to the observational_measures data. I agree that the code in self_report was complicated, but I did data checks and it did work as expected. I will work on adding comments to that section of my code.
Thank you, @m-gough!