PcAux-Package / PcAux

This is the repository for the PcAux project.
GNU General Public License v3.0
4 stars 5 forks source link

Remove instances of potentially type-inconsistent bracket subsetting #34

Closed ppanko closed 1 year ago

ppanko commented 5 years ago

Instances where a larger data.frame object is being subset to a smaller data.frame via bracket index need to be changed to exclude the comma due to potential inconsistency in the output object.

In general, we should move to using list subsetting for data.frame objects in most (if not all) cases.

kylelang commented 5 years ago

Why should we use list subsetting for data.frames? Doing so makes the code much less readable. Although data.frames are lists, as far as R is concerned, from the user's perspective, data.frames are two-dimensional tables. The matrix style bracket subsetting make the operations we're doing transparent.

ppanko commented 5 years ago

The reason for this is that matrix subsetting will return a vector if the subset ends up being a single column - this can cause problems if you are always expecting a data.frame. Here's an example:

data(iris)

myNames <- grep("Spec", names(iris))

myDf1 <- iris[,myNames]
class(myDf1)
## factor

myDf2 <- iris[myNames]
class(myDf2)
## data.frame 

Here i am emulating a function that does some sort of data.frame subsetting. In the typical use-case for this function, we assume that myNames is of length > 1; in the event that it is not, matrix subsetting will return a vector which is bad if a data.frame is the expected output type.

In short, this is an issue I see happening in PcAux mostly with small data sets. I think list subsetting should definitely be used to alleviate this issue when a data.frame is being subset to a smaller data.frame. I do not have a great preference for when data.frames are subset to single vectors - we may want to stick with matrix subsetting in that case.

kylelang commented 5 years ago

Ah, I see your point, and I definitely agree. I was not aware that you could get an N X 1 data.frame by subsetting with the single list-style brackets. Currently, we have several instances of awkward hacks to get around the automatic conversion of subsetted columns to vectors (usually amounting to implementing two versions of a method to deal with 1D and 2D objects).

kylelang commented 1 year ago

This project is no longer actively developed here, so I'm archiving this repository. In preparation thereof, I'm closing all open issues.