Open TimTaylor opened 1 year ago
I noticed you only mentioned [.data.frame
. Since data.frames are a hybrid between lists and matrices we can just use the list substitution [i]
, instead of the matrix way [,i]
. So:
iris[1]
iris["Species"]
iris[c(1,2,3)]
All preserve data.frame without the need for drop=FALSE
.
To clarify - the "wish" is for when there is an explicit i
. I know there are alternatives but I'd still like the default behaviour to be consistent and always return an object of the same type (data frame) irrespective of the length of j. Not, for example
rows <- 1:3
class(mtcars[rows, 1L])
#> [1] "numeric"
class(mtcars[rows, 1:2])
#> [1] "data.frame"
As a package developer, I often have dynamic input in the j
-slot that is user controlled. Forgetting the drop=FALSE
can lead to nasty bugs as often the code afterward assumes the resulting object is still a data.frame.
@ltuijnder as mentioned above: using df[j]
instead of df[,j]
should solve this problem for you.
This recently confused me as I was writing code iterating over the rows of data frames and sometimes the data frame only had one column.
Here's an example of what went wrong:
> data.frame(x = 42)[1,]
[1] 42
I didn't want it to lose its data.frame-ness. tibbles have sensible defaults:
> tibble(x = 42)[1,]
# A tibble: 1 × 1
x
<dbl>
1 42
This became a fairly divisive topic on Mastodon. This behavior is not intuitive or user friendly. https://fosstodon.org/@josi/111096750116664099
This issue should be about matrices and arrays preserving dimensionality (and hence matrix/array objects) after [
selection. As it affects these objects first. And data.frame
matrix-based selection would then follow too. Changing the behaviour only for data.frames
would introduce an inconsistency between data.frame
and matrix
objects.
As a possible interim solution is to make drop
of current x[i, j, ... , drop = TRUE]
to getOption("default.drop", TRUE)
, i.e. x[i, j, ... , drop = getOption("default.drop", TRUE)]
.
It can then be set per session (or per user, if one puts that in the initialization script): options(default.drop = FALSE)
.
@chainsawriot Then people's packages start behaving differently for each user and/or script. The option already didn't work for stringsAsFactors
, it is not going to work for this, either.
I know this is a common "want" but didn't find it recorded in the wish list so thought I'd write it down.
As we had the StringsAsFactors default changed in R 4.0.0 I wonder if, for R 5.0.0, there is opportunity to propose setting the default
drop
argument in[.data.frame
toFALSE
. Clearly this would be a major breaking change so would (potentially) require a lot of time/effort to fix packages on CRAN but aiming for 5.0.0 would, hopefully, make this achievable.Wanted to gage feedback here before enquiring on r-devel mailing list. Perhaps it's not a great idea anyway?