ProjectMOSAIC / mosaicCore

Utilities needed for other mosaic-family packages
1 stars 4 forks source link

df_stats() doesn't handle package qualified function names #15

Closed rpruim closed 4 years ago

rpruim commented 6 years ago
df_stats(~age, data = HELPrct, mean)
##   mean_age
## 1     35.7
df_stats(~age, data = HELPrct, base::mean)
##  Error in .::base : unused argument (mean) 
rpruim commented 6 years ago

The trouble is with our use of magrittr:

x %>% mean
## [1] 5.5
x %>% stats::mean
## Error in .::stats : unused argument (mean)

Unfortunately, it is super slick to use magrittr to get optional arguments into the function without using fargs.

Note: This does work, so we could just call the current state a feature and add some documentation and examples to alert people. (And perhaps emit a warning which the first element of the call is ::.)

df_stats(~ sex, data = KidsFeet, mosaic::prop())
##   prop_B
## 1  0.513
rpruim commented 6 years ago

Documentation for ... now includes this:

Functions used to compute the statistics. If this is empty, a default set of summary statistics is used. Functions used must accept a vector of values and return either a (possibly named) single value, a (possibly named) vector of values, or a data frame with one row. Functions can be specified with character strings, names, or expressions that look like function calls with the first argument missing. The latter option provides a convenient way to specify additional arguments. See the examples.

Note: If these arguments are named, those names will be used in the data frame returned (see details). Such names may not be among the names of the named arguments of df_stats().

If a function is specified using ::, be sure to include the trailing parens, even if there are no additional arguments required.

homerhanumat commented 6 years ago

Is this issue related to the following phenomenon?

df_stats(Sepal.Width ~ Species, data = iris, mean, n)

Error in (function () : unused argument (c(3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3, 3, 4, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3, 3.8, 3.2, 3.7, 3.3))
rpruim commented 6 years ago

No. The trouble here is that n isn't the right kind of function -- it doesn't accept any inputs.

> n
function () 
{
    abort("This function should not be called directly")
}
<environment: namespace:dplyr>

If you create your own function and call it n, then it works just fine.

n <- length
df_stats(Sepal.Width ~ Species, data = iris, mean, n)
     Species mean_Sepal.Width n_Sepal.Width
1     setosa            3.428            50
2 versicolor            2.770            50
3  virginica            2.974            50
rpruim commented 6 years ago

Off topic from the OP but related to the comments about n(), I've added two new functions:

df_stats( ~ AgeMonths, data = NHANES::NHANES, n_missing, n_not_missing, n_total, na.action = "na.pass")
##   n_missing_AgeMonths n_not_missing_AgeMonths n_total_AgeMonths
## 1                5038                    4962             10000

df_stats( AgeMonths ~ Gender, data = NHANES::NHANES, n_missing, n_not_missing, n_total, na.action = "na.pass")
##   Gender n_missing_AgeMonths n_not_missing_AgeMonths n_total_AgeMonths
## 1 female                2537                    2483              5020
## 2   male                2501                    2479              4980