datacarpentry / R-ecology-lesson

Data Analysis and Visualization in R for Ecologists
https://datacarpentry.org/R-ecology-lesson/
Other
313 stars 509 forks source link

Creating folder/ visualising data/ reshaping data #453

Closed alevigi closed 3 months ago

alevigi commented 6 years ago

I've found three small issues in this lesson:

Issue 1 - creation of data folder

In the lesson "Starting with data" the download.file() function is used to download the CSV file that contain the survey data.

According to the code, the destination of the file is the "/data" folder that has never been created so the student will most likely have an error. I would suggest to create the folder first, using the dir.create() function.

dir.create("data", showWarnings = FALSE)

download.file("https://ndownloader.figshare.com/files/2292169", "data/portal_data_joined.csv")

Issue 2 - denity plot

When we describe how to visualise data with ggplot2 we explain that "we need to define define a mapping (using the aesthetic (aes) function), by selecting the variables to be plotted and specifying how to present them in the graph, e.g. as x/y positions"

I think it would be nice to show them a plot wher only aestethic is needed (i.e. geom_density()). We can also show them how to add a line for the mean using the function geom_vline() as another example of adding geoms.

We first calculate the mean of the weight and we store it in an object surveys_mean -> surveys_complete %>% summarize(mean_weight = mean(weight, na.rm = TRUE))

ggplot(data = surveys_complete, mapping = aes(x = weight, color=sex))+geom_density()+ geom_vline(data=surveys_mean, aes(xintercept=mean_weight),linetype="dotted",size=1)

Issue 3 - reshaping data

I think that the section "Reshaping with gather and spread" in the Manipulating data frames lesson is a bit difficult to understand at first. Students don't really understand why we want to reshape the data. They don't see the point. It is much easier to understand the concept in the context of ggplot when they see that you need a column for each axes.

I would perhaps move this session to the "Visualising data" lesson. We could first spread the data and show them that in the wide format is not possible to use the genuses and the mean weight for a plot.

Alessandra

andrew66882011 commented 6 years ago

Good catches!

Comments on the three issues you reported:

  1. It's important to create a folder ("data" here but could be named something else - point out to learners) before using the function download.file() to download a data file because for most R novices, this is not obvious and it's also a good habit to save raw (original) data in a different folder than for temporary data generated during a R session as mentioned later in the lesson.

  2. It's important to tell learners that there are two ways to specify "aesthetics" (color, size, etc): a variable in the data and a specific value. It might be a good idea to mention (not necessarily go through) the site: https://cran.r-project.org/web/packages/ggplot2/vignettes/ggplot2-specs.html so that learners can get a good resource (reference) when needed.

  3. It's important to learn how to transform a data between "long" and "wide" forms (reshaping), especially for people doing longitudinal data analyses (time series analyses, panel data analyses, repeat measures, and the like. Combining with visualization might reinforce learners' impression on the effects and differences but its importance make it worth an individual section.

Jianjun

anacost commented 6 years ago

Hi @alevigi and @andrew66882011 , thank you for your contribution! Issue 1 - creation of data folder The folder should be created in the section "Before we start"- "Organizing your working directory" https://datacarpentry.org/R-ecology-lesson/00-before-we-start.html#organizing_your_working_directory

Issue 2 - would you add an example or challenge and start a pull request?

Issue 3 - How would you explain it better? It is one of the objectives of this lesson "Manipulating data frames" to "Describe the concept of a wide and a long table format and for which purpose those formats are useful."

tobyhodges commented 3 months ago

Thanks everyone for contributing to this discussion. The lesson underwent a major update and reorganisation when https://github.com/datacarpentry/R-ecology-lesson/pull/887 was merged. As this issue relates to content in a version of the lesson before that update took place, I will close it. Please open a new issue if you believe that some or all of the changes being discussed here are remain relevant to the redesigned lesson, linking to this thread where relevant.