datacarpentry / R-ecology-lesson

Data Analysis and Visualization in R for Ecologists - the version at https://github.com/datacarpentry/R-ecology-lesson-alternative will be merged on 8th July 2024
https://datacarpentry.org/R-ecology-lesson/
Other
305 stars 508 forks source link

Improve the narrative in the ggplot2 lesson #93

Open fmichonneau opened 8 years ago

fmichonneau commented 8 years ago

We present 3 types of plots in the ggplot lesson:

However, it feels that this lesson could be made more interactive and the 3 types of plot presented don't seem well justified and/or included in a narrative.

joshsteele commented 6 years ago

Hi. At the risk of asking a naive question as an instructor in training, has there been a much progress on this since ? I'd be happy to help pitch in and take a stab at justifying the plots in a more example based way. If there are folks who have already begun this process, I'd be happy to coordinate and add in where it would be most helpful.

For example I was thinking that the simple statement of why you would want to perform the scatterplot visualization in the Challenge section could be moved to an earlier part of the lesson. Potentially stating why you'd be interested in seeing your data with each plot type. Something like the first thing you want to do with a dataset is take a look at it and see what immediately jumps out at you, often using a scatterplot. You want to summarize the data distributions with boxplots and so on. This exploring the data framework applied to the section then would set up the iterative plot construction, adding colors, splitting into facets, etc. If this isn't useful, feel free to point me in a different direction. If it's helpful, I can take a stab at re-working the text.

fmichonneau commented 6 years ago

Hi @joshsteele Thanks for your comment. Yes there is still room for improvement and not much progress done on this (see also #271). What you outline sounds good and would be a great start. Ideally, what I'd like to see in this chapter is something that would reflect more closely how a researcher would use visualization to:

  1. explore the dataset (I think what you mention)
  2. create a visualization to highlight a pattern that can be explained by a scientific hypothesis.

In other words, you are on the right track and we would welcome your contribution!

joshsteele commented 6 years ago

Hi @fmichonneau Glad to hear that I'm thinking in a similar direction to you folks. I will begin working on this this week and I will reach out to you with questions as they arise.

thiagosfsilva commented 6 years ago

What is the current status for this issue? I am looking for options for making my first contribution as an instructor in training, and this feels like something I could contribute to. Any specific directions on what are the priorities for improvement? Thank you.

fmichonneau commented 6 years ago

hi @thiagosfsilva I have started a new ggplot2 lesson in the "tidyverse-first" branch. It's very bare bone at the moment, so any contributions on this would be welcome.

Note:

  1. you'll need this version of the ratdat package
  2. this will be the first episode of the lesson
cnoecker commented 6 years ago

Hi, I'm another instructor-in-training looking to make a contribution.

In the customization section, I am wondering if it might be better to show how to use scale_x_continuous(name = "", limits=c()) than to teach xlab/xlim, since the same class of functions can then be used to change legends (using scale_color/etc), and to do a lot of other customization.

I could draft some text for this section in the new lesson, if that would be helpful.

tavareshugo commented 6 years ago

@fmichonneau I'm quite keen on contributing some materials to the "tidyverse-first" branch, as I find that current narrative of the course could be improved (by addressing this issue alongside #194 and #378 ) and it seems like this new branch is going that way. Is there an overall plan for it?

I don't know if this helps, but one possible narrative:

Sorry, this was a bit long in the end. Does this make sense, is it OK to start some contributions along these lines?

fmichonneau commented 6 years ago

@tavareshugo we started doing some work on this in the tidyverse-first branch which is rendered at https://dc-r-ecology-dev.netlify.com/

Very early stages, but we welcome feedback, ideas, and contributions there!

maglet commented 5 years ago

I think that instead of talking about 'how to make a ____ plot", we should talk more about plotting types of data against each other (ie, numerical vs numerical, categorical vs. numerical) and what geoms suit each situation.

I'd be willing to submit a pull request on the current lesson. Or is the switch to tidyverse first being made?

ac812 commented 5 years ago

Regarding the plots, I taught the Data Analysis and Visualisation section a few weeks ago and one thing I found that helped students was to first describe what boxplots are, which is missing the current material. Here is an image I used to explain boxplots: https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

bolimsydneyson commented 4 years ago

Hi, I have a few thoughts on improving visualization with ggplot2 lesson.

1) Histograms Histograms are often used, and there are different type of histogram shaped functions in ggplot2. Some need just x aesthetics, some need both x and y aesthetics. It will be helpful to cover which are which and have a section to cover these bar-shaped visualizations.

2) Scatter plots After plotting, I often used geom_smooth on top of the scatterplot to generate lines. People can draw linear regressions right away on the scatterplot, which seems useful.

3) piping into ggplot2 after dplyr If simple data manipulation is necessary before visualization (eg. sum by groups), it will be helpful to know that one can visualize right after the dplyr pipe line, and do not need to specify ggplot(data = data_name) part.

DrMaggie commented 3 years ago

Reading through all the comments above was very interesting and enlightening, not the least for a novice instructor as myself! I note that in addition to streamlining the narrative (and the reasoning behind what type of plot is useful for different purposes), it is also important to have clear examples that do not simultaneously introduce new concepts and add a level of complexity.

One example of the latter is the final example, under "Exporting plots" - here, the code example suddenly introduces the grid.arrange() function without mentioning that this requires a further package to be installed, i.e. gridExtra and/or arrangeGrob.

## This also works for grid.arrange() plots
combo_plot <- grid.arrange(spp_weight_boxplot, spp_count_plot, ncol = 2, 
                           widths = c(4, 6))
ggsave("combo_plot_abun_weight.png", combo_plot, width = 10, dpi = 300)

(At least on my system, with only tidyverse loaded, the above piece of code didn't execute, but after googling and installing gridExtra, I got some reasonable output ;-)

YaraRAA commented 3 years ago

Please bear with me as I'm not very familiar with github. I've tried to make the below comment legible. Formatting feedback appreciated!

@fmichonneau Looking at the link you shared earlier: https://github.com/datacarpentry/R-ecology-lesson/blob/tidyverse-first/01-visualizing-ggplot.Rmd

It looks like the ratdat package installation instructions need to be added since a simple install in R version 4.1.0 isn't working so I suggest this edit to line 38:

library(devtools)
install_github("weecology/ratdat")
library(ratdat)

note: after loading the ratdat library, I couldn't find the portal_dipo data.frame.

I've read through the above comments and I think the justification of different plot types hasn't been addressed yet.

Line 27: add the following introductory text

Plots are powerful way to:

  • explore your data frame i.e. look at the distribution of continuous data
  • explore relationships between two or more columns
  • present your data and findings to others

Line 54

a column for every dimension

is a bit confusing to me, I would change this to:

a column for every variable

Line 92

A scatter plot is a great way to visually explore the relationship between two columns containing continuous data. It allows you to check for possible patterns.

Challenge

What does this scatter plot tell you about the relationship between weight and hindfoot_length?

Line 156

  1. examine the plot with a different color for each species. Does it look like there is a relationship between these two variables within species?
GitHubDoug commented 2 years ago

I suggest (re)moving the final section 'Arranging plots', installing and using 'patchwork' to arrange multiple ggplot objects.
It seems beyond the scope of introducing data visualization and would be better placed in a module on report generation.