m-gough commented 1 year ago

Restyle code

mbcann01 commented 1 year ago

Thank you, @m-gough! I will take a look ASAP!

mbcann01 commented 1 year ago

Hi @m-gough!

First of all, I am so sorry for taking so long to review this pull request. I'm have been absolutely swamped lately. Second, I wrote a really long, detailed comment here, impulsively clicked on Files changed tab at the top of the screen, and then realized the the entire comment was deleted when I navigated way from this page! 🤦‍♂️😡 But, I will try my best to recreate it.

Overall

Overall, you a doing amazing work and I'm really grateful for your help! I do have a few comments that are meant to be constructive and not at all critical!

File names

[ ] Let's try to follow the conventions here for file names. So, observational_measures_recode_factor_6.19.Rmd might become something like data_0x_observational_recode_factors.Rmd.
[ ] Let's don't use dates in the file names. Avoiding practices like that is a big part of the reason we are using versioning/git/GitHub. See more on this here.

YAML header

[ ] I appreciate that you are trying to use best practices for variable naming as the title in the YAML header. However, the title is one of the few places where we can write normally. Its only purpose is to tell us what the file does. Therefore, you can be a little more descriptive if you would like.
[ ] Let's use Rmd files, but not Notebooks. We don't really need the HTML files right now. They are easy enough to generate later if we need them. To stop R from generating the HTML documents, please just remove output: htm_notebook from the YAML header.

Loading packages

[ ] I see that you sometimes load dplyr with library(tidyverse). There is nothing "wrong" with this method! However, it does load a bunch of other stuff that we don't always need. Therefore, my preference is generally to load each tidyverse package we need individually. And since we use dplyr so often, I tend to always write library(dplyr, warn.conflicts = FALSE) right off the bat in my files.

Spacing

[ ] I'm noticing some blank lines between the last line of code within your code blocks and the end of the code block. For example, lines 12 and 24 of observational_measures_recode_factor_6.19.Rmd. Let's delete those, please.
[ ] Let's try to always include a blank line between headers and the text or code chunk that follows them. For example, let's add a blank line between #### Write to CSV on line 543 of observational_measures_recode_factor_6.19.Rmd and the code chunk that follows it.

Headers

[ ] I love the use of headers! I think they visually break up the code and make the RStudio outline pane really functional. Let's try to be more intentional about which headers are first level, second level, etc. It's kind of difficult to write cut-and-dry rules for this, but my general advise is think about writing an instruction manual or something. Where do you expect things to be nested? In observational_measures_recode_factor_6.19.Rmd, for example, Data cleaning should probably remain a level 1 header, but all of the individual variables you recode to factors below should probably be level 2 headers. They are specific instances of data cleaning. As another example, #### Write to CSV should probably not be a level 4 header. It should probably be a level 1 header (# Write to CSV).
[ ] Let's don't use periods at the end of headers.

Factors

[ ] I love that you are following the _f naming conventions that we discussed in the R class. However, in the case of character variable like unusual_odor (see observational_measures_recode_factor_6.19.Rmd, line 60), have both a character version and a factor version doesn't do a lot of good for us. I think it's safe to go head and just convert those to factors without keeping the character version. And since there is only a factor version of the variable, there is really no need to mark it with _f.
[ ] For numeric/scale variables like clothes (see observational_measures_recode_factor_6.19.Rmd, line 87), I like that you created a numeric version and a factor version with the _f naming convention. Great job!

self_report_recode_factor.Rmd

There is a lot of code like this:

self_report <- self_report %>%
  mutate(
    across(
      .cols = starts_with("neglect") & !ends_with("help") &
        !ends_with("reliable") & !ends_with("person"),
      .fns  = ~ case_when(
        .x == "Yes"        ~ 1,
        .x == "No"         ~ 2,
        .x == "Don't know" ~ 7,
        .x == "Refused"    ~ 9 
      ),
      .names = "{col}_4cat"
    ),
    across(
      .cols = starts_with("neglect") & !ends_with("help") &
        !ends_with("reliable") & !ends_with("person") & !ends_with("4cat"),
      .fn    = ~ factor(.x, levels = levels_yes_no),
      .names = "{col}_4cat_f"
    )
  )

Whichi is great code! But the column selection starts_with("neglect") & !ends_with("help") & !ends_with("reliable") & !ends_with("person") is pretty complex.

[ ] Did you do data checks to make sure it is working as expected?
[ ] This would also be a great place to insert some comments about what you're doing and why.

I think that's all I've got for now. Please let me know if you have any questions.

m-gough commented 1 year ago

Thank you so much for the feedback! I made the changes to the observational_measures data. I agree that the code in self_report was complicated, but I did data checks and it did work as expected. I will work on adding comments to that section of my code.

mbcann01 commented 1 year ago

Thank you, @m-gough!

brad-cannell / detect_fu_interviews_public

Update code for observational measures #4

Overall

File names

YAML header

Loading packages

Spacing

Headers

Factors

self_report_recode_factor.Rmd