datacarpentry / R-ecology-lesson

Data Analysis and Visualization in R for Ecologists
https://datacarpentry.org/R-ecology-lesson/
Other
314 stars 508 forks source link

Small typo in the first challenge of the Complex database queries #731

Closed Eniatnof closed 3 years ago

Eniatnof commented 3 years ago

Hello,

First of all, thank you for the really interesting lesson.

I noticed there was a small mistake in the first challenge of the part "Complex database queries" (Write a query that returns the number of rodents observed in each plot in each year.) where the SQL syntax is as it doesn't match the dplyr syntax's result.

## with dplyr syntax
species <- tbl(mammals, "species")

left_join(surveys, species) %>%
  filter(taxa == "Rodent") %>%
  group_by(taxa, year, plot_id) %>%
  tally() %>%
  collect()

    ## with SQL syntax
    query <- paste("
    SELECT a.year, b.taxa,count(*) as count
    FROM surveys a
    JOIN species b
    ON a.species_id = b.species_id
    AND b.taxa = 'Rodent'
    GROUP BY a.year, b.taxa",
    sep = "" )

    tbl(mammals, sql(query))

The statement where GROUP BY is should also have _a.plotid in order to get the same result as for the dplyr syntax.

   ## with SQL syntax
    query <- paste("
    SELECT a.year, b.taxa,count(*) as count
    FROM surveys a
    JOIN species b
    ON a.species_id = b.species_id
    AND b.taxa = 'Rodent'
    GROUP BY a.year, a.plot_id, b.taxa",
    sep = "" )

    tbl(mammals, sql(query))

Thank you again for the lesson :-)

Cheers

Teebusch commented 3 years ago

Thank you , @Eniatnof, well spotted! I have made a PR with your suggestions. I've also changed the orde of grouping columns so it is consistent between the dplyr and SQL solution, that is GROUP BY b.taxa, a.year, a.plot_id. Would you kindly check if #735 looks ok to you?