datacarpentry / R-ecology-lesson

Data Analysis and Visualization in R for Ecologists
https://datacarpentry.org/R-ecology-lesson/
Other
314 stars 508 forks source link

Subsetting tibbles with `df[, col]` returns a data frame, not a vector #680

Closed mikemahoney218 closed 3 years ago

mikemahoney218 commented 3 years ago

There appears to be an error in the indexing and subsetting data frames section of Starting with Data. While subsetting base data frames with df[, col] returns a vector, subsetting tibbles with this format return a dataframe identically to subsetting via df[col]:

class(iris[, "Species"])
#> [1] "factor"
class(tibble::as_tibble(iris)[, "Species"])
#> [1] "tbl_df"     "tbl"        "data.frame"

Created on 2021-01-26 by the reprex package (v0.3.0)

I imagine this has changed since the lesson was first written as tibble strives to improve consistency. As we read in the surveys data set with readr::read_csv, the table is read in as a tibble, and the comment in the text (that subsetting with the comma returns a vector) is incorrect.

With that said, I don't know if it makes sense to specifically highlight that this is one of the differences with tibbles that the lesson earlier says isn't worth getting into, or to not mention that base data frames behave differently. It might make sense to mention when introducing tibbles that the goal of tibbles are to provide more consistent behavior than base data frames, which would provide a basis for flagging those differences as they come up?

fmichonneau commented 3 years ago

Good catch! With the transition to using tibbles, I think this section could be greatly simplified. I don't think it's necessary to get into the details of the differences between tibbles and data.frames given the intended audience for these lessons.

Would you be willing to start a pull request that addresses this issue?

Thank you!

mikemahoney218 commented 3 years ago

Of course! Opened #683 .

Teebusch commented 3 years ago

closed by #683