HenrikBengtsson / Wishlist-for-R

Features and tweaks to R that I and others would love to see - feel free to add yours!
https://github.com/HenrikBengtsson/Wishlist-for-R/issues
GNU Lesser General Public License v3.0
134 stars 4 forks source link

WISH: `drop` = FALSE by default for `[.data.frame` #144

Open TimTaylor opened 1 year ago

TimTaylor commented 1 year ago

I know this is a common "want" but didn't find it recorded in the wish list so thought I'd write it down.

As we had the StringsAsFactors default changed in R 4.0.0 I wonder if, for R 5.0.0, there is opportunity to propose setting the default drop argument in [.data.frame to FALSE. Clearly this would be a major breaking change so would (potentially) require a lot of time/effort to fix packages on CRAN but aiming for 5.0.0 would, hopefully, make this achievable.

Wanted to gage feedback here before enquiring on r-devel mailing list. Perhaps it's not a great idea anyway?

karoliskoncevicius commented 1 year ago

I noticed you only mentioned [.data.frame. Since data.frames are a hybrid between lists and matrices we can just use the list substitution [i], instead of the matrix way [,i]. So:

iris[1]
iris["Species"]
iris[c(1,2,3)]

All preserve data.frame without the need for drop=FALSE.

TimTaylor commented 1 year ago

To clarify - the "wish" is for when there is an explicit i. I know there are alternatives but I'd still like the default behaviour to be consistent and always return an object of the same type (data frame) irrespective of the length of j. Not, for example

rows <- 1:3
class(mtcars[rows, 1L])
#> [1] "numeric"
class(mtcars[rows, 1:2])
#> [1] "data.frame"
ltuijnder commented 1 year ago

As a package developer, I often have dynamic input in the j-slot that is user controlled. Forgetting the drop=FALSE can lead to nasty bugs as often the code afterward assumes the resulting object is still a data.frame.

karoliskoncevicius commented 1 year ago

@ltuijnder as mentioned above: using df[j] instead of df[,j] should solve this problem for you.

InductiveStep commented 1 year ago

This recently confused me as I was writing code iterating over the rows of data frames and sometimes the data frame only had one column.

Here's an example of what went wrong:

> data.frame(x = 42)[1,]
[1] 42

I didn't want it to lose its data.frame-ness. tibbles have sensible defaults:

> tibble(x = 42)[1,]
# A tibble: 1 × 1
      x
  <dbl>
1    42
JosiahParry commented 1 year ago

This became a fairly divisive topic on Mastodon. This behavior is not intuitive or user friendly. https://fosstodon.org/@josi/111096750116664099

karoliskoncevicius commented 1 year ago

This issue should be about matrices and arrays preserving dimensionality (and hence matrix/array objects) after [ selection. As it affects these objects first. And data.frame matrix-based selection would then follow too. Changing the behaviour only for data.frames would introduce an inconsistency between data.frame and matrix objects.

chainsawriot commented 1 year ago

As a possible interim solution is to make drop of current x[i, j, ... , drop = TRUE] to getOption("default.drop", TRUE), i.e. x[i, j, ... , drop = getOption("default.drop", TRUE)].

It can then be set per session (or per user, if one puts that in the initialization script): options(default.drop = FALSE).

gaborcsardi commented 1 year ago

@chainsawriot Then people's packages start behaving differently for each user and/or script. The option already didn't work for stringsAsFactors, it is not going to work for this, either.