datacarpentry / R-ecology-lesson

Data Analysis and Visualization in R for Ecologists
https://datacarpentry.org/R-ecology-lesson/
Other
314 stars 508 forks source link

Suggest introducing tidyverse at Manipulating data section #796

Closed QuinnAsena closed 2 years ago

QuinnAsena commented 2 years ago

Suggestion from using lesson in teaching demo. Feel free to ignore/close:

https://github.com/datacarpentry/R-ecology-lesson/blob/d455aa5e69682a3c0413e6c2a9a29f7816e07d20/02-starting-with-data.Rmd#L98

I found it was an overload to introduce packages, tidyverse and tibbles so early on when using the BASE function read.csv does not alter the flow of the section and it is important for novices to understand basics like regular dataframes before their extensions (tibbles and data.tables).

I suggest introducing packages with lubridate at the 'Formatting dates' section and introduce the tidyverse cluster in the following section. This flow seems easier to teach as well as learn.

Teebusch commented 2 years ago

Hi, @QuinnAsena, thank you for your suggestion!

I prefer the current structure. My response to the two issues you raise:

Introducing packages too early

This lesson used to use read.csv(). I believe we made the switch to read_csv() because it reads the data with more sensible defaults and causes less trouble in the long run. This makes it necessary to introduce the idea of a package very early. However, I think this is done nice and simple in just 2 or 3 sentences:

Packages in R are basically sets of additional functions that let you do more stuff. The functions we’ve been using so far, like round(), sqrt(), or c() come built into R. Packages give you access to additional functions beyond base R. [...] Before you use a package for the first time you need to install it on your machine, and then you should import it in every subsequent R session when you need it.

Packages are such a fundamental concept to working with R (and much of its power comes from packages) that I think it's ok to introduce them early.

This is subjective, but when I have taught this lesson, I didn't feel that this interrupted the flow. If learners understand what packages are (and what functionality they unlock) this can be a very empowering moment.

Introducing tibbles too early

I'd argue that we are not really introducing tibbles at all. We are essentially saying that they are a form of data frames and that's all one needs to know right now.

When we loaded the data into R, it got stored as an object of class tibble, which is a special kind of data frame (the difference is not important for our purposes, but you can learn more about tibbles here). Data frames are the de facto data structure for most tabular data, and what we use for statistics and plotting.

QuinnAsena commented 2 years ago

Thanks for the response @Teebusch. True it is subjective and you make good points so I'm happy to close the issue. In general, I think we need to be a little careful around teaching R as the tidyverse as opposed to teaching the tidyverse as an (very useful) extension to R. Also a subjective point, and tidyverse is particularly appropriate to the ecology lesson. Not sure if it is mentioned anywhere that the same results can be achieved, for example, in base or data.table?

Teebusch commented 2 years ago

In general, I think we need to be a little careful around teaching R as the tidyverse as opposed to teaching the tidyverse as an (very useful) extension to R. Also a subjective point, and tidyverse is particularly appropriate to the ecology lesson. Not sure if it is mentioned anywhere that the same results can be achieved, for example, in base or data.table?

It's certainly a pedagogical choice to focus on tidyverse for this intro to R. I think the tidyverse, with it's verb-based, recipe-like syntax is particularly suited for beginners with potentially no coding experience.

It's sort of said "between the lines" that base R can do the same things when we introduce the tidyverse, where we kind-of say that the dplyr functions are a substitute for the base R bracketing syntax.:

"Bracket subsetting is handy, but it can be cumbersome and difficult to read, especially for complicated operations. Enter dplyr. dplyr is a package for helping with tabular data manipulation."

I know how tempting it is to add more information to this lesson. There are so many interesting things to learn (and teach) about R, and so many useful packages. However, we have to be careful to not overload the lesson with information and opinions that are important to intermediate or advanced R users but probably meaningless or even confusing to a novice.