Open cenuno opened 4 years ago
This requires us to to be aware of the get()
function, which refers to R objects by name rather than by value. When dplyr::filter()
evaluates your code, you've never needed to specify the column names in quotes. However, now that you want the user to specify the columns of interest, we need to use get()
to perform this type of dynamic filtering.
Let's assume the filter_df()
function down below exists in the analysis/functions/utlities.R
file. Notice that the function contains high level documentation that tells the reader three things:
The more descriptive you can be with respect to object types in your argument section (e.g. df = data frame
) and in your output, the easier it is for folks to understand what should be happening in your code. This is very similar to pseudo code in that it forces you to express what the logic should do (and in case the logic isn't working, you can share your code and allow others to offer help).
Here, filter_df()
will return a data frame whose column_name
values are exactly equal to value
:
# create function ----
filter_df <- function(df, value, column_name) {
# Return a df whose `column_name` values are exactly equal to `value`
#
# Arguments
# - df: a data frame
# - value: a number
# - column_name: a character vector that represents a column
#
# Return
# - a data frame
# filter the df based on records that contain values less than or equal
# to the value in the given column_name.
tmp_df <- dplyr::filter(df,
# note: the use of get() refers to
# R objects by name rather than value.
get(column_name) == value)
# return to the Global Environment
return(tmp_df)
}
get()
inside dplyr::filter()
allows for this type of dynamic filteringNow that we have our custom function, let's check it out (for more on source()
, see #19):
# load necessary packages
library(dplyr)
library(here)
# load necessary functions
source(here("analysis/functions/utlities.R"))
# test the function
filter_df(iris, column_name = "Sepal.Width", value = 2)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5 | 2 | 3.5 | 1 | versicolor |
Let's test the function again, this time using Petal.Width
rather than Sepal.Width
in the column_name
argument:
filter_df(iris, column_name = "Petal.Width", value = 2)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
6.5 | 3.2 | 5.1 | 2 | virginica |
5.7 | 2.5 | 5.0 | 2 | virginica |
5.6 | 2.8 | 4.9 | 2 | virginica |
7.7 | 2.8 | 6.7 | 2 | virginica |
7.9 | 3.8 | 6.4 | 2 | virginica |
6.5 | 3.0 | 5.2 | 2 | virginica |
Notice that regardless if I used column_name="Sepal.Width"
or column_name="Petal.Width"
, the filter_df()
function was able to translate the string the user provided into column_name
and convert it into an object via the get(column_name)
. The get()
function ensures the string is read as an object rather than as a string, which is what dplyr::filter()
requires in order to use it.
Thank you @cenuno
How do you create a function that is flexible with respect to column names using
dplyr::filter()
function?The comparison logic inside the function is the same but the only thing different is the column name and value to be used in the comparison logic.