NEONScience / NEON-Data-Skills

Self-paced tutorials that review key data literacy concepts and data analysis skills. Published materials can be found at:
https://www.neonscience.org/resources/learning-hub/tutorials
GNU Affero General Public License v3.0
77 stars 89 forks source link

Use `left_join()` `by` and `suffix` args instead of renaming columns manually. #632

Open Aariq opened 1 year ago

Aariq commented 1 year ago

https://github.com/NEONScience/NEON-Data-Skills/blob/e59c9c2ba55dbcc67ae3f37d80bbbf39c9e914e5/tutorials/R/biodiversity/neon-phenology-temp/01-explore-phenology-data/01-explore-phenology-data.R#L113-L121

Instead of renaming columns manually before left_join()ing, you can do something like this:

left_join(
  status_noD,
  ind_noD,
  by = c(
    "namedLocation",
    "domainID",
    "siteID",
    "plotID",
    "individualID",
    "release"
  ),
  suffix = c("Stat", "")
)

This will add the suffix "Stat" to any duplicate column names not in the vector supplied to by

cklunch commented 1 year ago

@Aariq Thanks for the suggestion! For tutorials, it often makes sense to do tasks separately even if it's possible to combine them, so the process and the steps are clearer to the student. I'll leave it up to @kjones13 whether to make this change.

Aariq commented 1 year ago

Yeah, I totally get that. I just felt to me like it would make more sense to explain that you want to join by certain columns rather than that you want to avoid certain columns when joining.

kjones13 commented 1 year ago

Hi @Aariq, this is very clean code, and I really I appreciate the suggestion. In this instance, the intent in this step was to be explicit about acknowledging the fieldnames the two data frames have in common and intentionally renaming the fields before joining, as part of the data exploration step.