choonghyunryu / dlookr

Tools for Data Diagnosis, Exploration, Transformation
https://choonghyunryu.github.io/dlookr/
208 stars 35 forks source link

Abnormal situation in univar_category() #69

Closed choonghyunryu closed 2 years ago

choonghyunryu commented 2 years ago

There are situations where univar_category() behaves abnormally.

As shown below, if 'x' exists in the variable name of the data frame, the frequency table of all categorical variables is calculated as the value of the variable 'x'.

> names(ggplot2::diamonds)
 [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"   "x"       "y"      
[10] "z"      
> univar_category(ggplot2::diamonds)
$cut
# A tibble: 554 × 3
     cut     n      rate
   <dbl> <int>     <dbl>
 1  0        8 0.000148 
 2  3.73     2 0.0000371
 3  3.74     1 0.0000185
 4  3.76     1 0.0000185
 5  3.77     1 0.0000185
 6  3.79     2 0.0000371
 7  3.81     3 0.0000556
 8  3.82     2 0.0000371
 9  3.83     3 0.0000556
10  3.84     4 0.0000742
# … with 544 more rows

$color
# A tibble: 554 × 3
   color     n      rate
   <dbl> <int>     <dbl>
 1  0        8 0.000148 
 2  3.73     2 0.0000371
 3  3.74     1 0.0000185
 4  3.76     1 0.0000185
 5  3.77     1 0.0000185
 6  3.79     2 0.0000371
 7  3.81     3 0.0000556
 8  3.82     2 0.0000371
 9  3.83     3 0.0000556
10  3.84     4 0.0000742
# … with 544 more rows

$clarity
# A tibble: 554 × 3
   clarity     n      rate
     <dbl> <int>     <dbl>
 1    0        8 0.000148 
 2    3.73     2 0.0000371
 3    3.74     1 0.0000185
 4    3.76     1 0.0000185
 5    3.77     1 0.0000185
 6    3.79     2 0.0000371
 7    3.81     3 0.0000556
 8    3.82     2 0.0000371
 9    3.83     3 0.0000556
10    3.84     4 0.0000742
# … with 544 more rows
choonghyunryu commented 2 years ago

fix the error by handling 'x' assigned to a variable within select() as follows:

from select(variable = x) to select(variable = all_of(x))