datacarpentry / R-ecology-lesson

Data Analysis and Visualization in R for Ecologists
https://datacarpentry.org/R-ecology-lesson/
Other
314 stars 508 forks source link

Suggestions for the R and SQL episode #835

Closed Aron-github closed 4 months ago

Aron-github commented 1 year ago

In the R Ecology Lesson, under "SQL databases and R" I would suggest the following edits, believing that this might help learner transition from the previous episode to the current.

Often, public or private databases are structured using SQL (Structured Query Language), a standardized programming language that is used to manage relational databases and perform various operations on the data in them. These operations are in principle similar to what we have explored so far using tidyverse in the Manipulating data episode (select, filter, perform operations, etc.) but expressed through the SQL grammar.

Interfacing with databases using dplyr focuses on retrieving and analyzing datasets by generating SELECT SQL statements, but it doesn’t modify the database itself. dplyr does not offer functions to UPDATE or DELETE entries.

with

Interfacing with databases using dplyr focuses on retrieving and analyzing datasets by converting R code into the corresponding SQL statements, but it doesn’t include any function to directly modify the content of database itself (e.g. by updating or deleting data).

  1. Querying the database with the dplyr syntax
  2. SQL translation (using the same exaple used in 1.)
  3. Querying the database with the SQL syntax

In this way, learners could understand the SQL code used in 3. because it is the SQL translation of the R query used in 1.

species <- tbl(mammals, "species")
genus_counts <- left_join(surveys, plots) %>%
  left_join(species) %>%
  filter(taxa == "Rodent") %>%
  group_by(plot_type, genus) %>%
  tally() 

with

species <- tbl(mammals, "species")
genus_counts <- left_join(surveys, plots) %>%
  left_join(species) %>%
  filter(taxa == "Rodent") %>%
  count(plot_type, genus) 
tobyhodges commented 4 months ago

Thanks @Aron-github for opening this issue, and hi 👋 The lesson underwent a major update and reorganisation when https://github.com/datacarpentry/R-ecology-lesson/pull/887 was merged. One significant change is that the content on interacting with databases has been removed. As this issue relates that content, I will close it. Please open a new issue if you have suggestions for how the new content could be improved.