geco-bern / agds

Applied Geodata Science book. Developed for the lecture(s) with the same name at the Institute of Geography, University of Bern.
https://geco-bern.github.io/agds/
Other
5 stars 6 forks source link

Note on mixing data prep with plotting routines #58

Closed khufkens closed 1 year ago

khufkens commented 1 year ago

https://github.com/geco-bern/agds/blob/0098a4ab6930e9421027405cfa1c4970a2c45114/04-data_vis.Rmd#L96

# Bar plot
gg1 <- ddf |> 
  dplyr::mutate(month = month(date, label = TRUE)) |> 
  dplyr::group_by(month) |> 
  dplyr::summarise(GPP_NT_VUT_REF = mean(GPP_NT_VUT_REF)) |> 
  ggplot(aes(x = month, y = GPP_NT_VUT_REF)) +
  geom_bar(stat = "identity") +
  theme_classic() +
  labs(title = "Bar plot",
       x = "Month", 
       y = expression(paste("Mean GPP (gC m"^-2, "s"^-1, ")")))

Should be:

# prepare data
plot_data <- ddf |> 
  dplyr::mutate(month = month(date, label = TRUE)) |> 
  dplyr::group_by(month) |> 
  dplyr::summarise(GPP_NT_VUT_REF = mean(GPP_NT_VUT_REF))

# plot the figure
gg1 <- ggplot(
 data = plot_data,
 aes(x = month, y = GPP_NT_VUT_REF)) +
  geom_bar(stat = "identity") +
  theme_classic() +
  labs(title = "Bar plot",
       x = "Month", 
       y = expression(paste("Mean GPP (gC m"^-2, "s"^-1, ")")))

Preprocessing can be costly, teaching students this scheme will run them into trouble if things start taking tons of time when preparing data. The same data prep routine is used in the subsequent gg2 plot, which shouldn't be. This scheme repeats. It might be convenient in the context of the book but it will be picked up by students as "best practice", which it isn't.

khufkens commented 1 year ago

https://github.com/geco-bern/agds/pull/59

khufkens commented 1 year ago

@stineb I've cleaned this up, you should pull the latest changes.

stineb commented 1 year ago

ok, thanks. Just pulled.