KlausVigo / kknn

Weighted k-Nearest Neighbors
http://klausvigo.github.io/kknn/
23 stars 10 forks source link

predict.train.kknn() does not respect all parameters from train.kknn() #22

Open schoonees opened 4 years ago

schoonees commented 4 years ago

predict.train.kknn() does not respect all parameters passed to train.kknn(). An example is scale.

For example, predicting with scale = FALSE and scale = TRUE with train.kknn() give the same results:

library(tidymodels)
data("mtcars")
set.seed(1)
mtcars_split <- initial_split(mtcars, prop = 0.7)

## scale = FALSE
kknn::train.kknn(formula = mpg ~ disp + wt, data = training(mtcars_split), 
                 ks = 5, scale = FALSE) %>% 
  predict(testing(mtcars_split))
#> [1] 21.032 21.784 16.668 16.052 21.264 16.404 26.340 16.076 15.620

## scale = TRUE
kknn::train.kknn(formula = mpg ~ disp + wt, data = training(mtcars_split), 
                 ks = 5, scale = TRUE) %>% 
  predict(testing(mtcars_split))
#> [1] 21.032 21.784 16.668 16.052 21.264 16.404 26.340 16.076 15.620

But kknn() correctly shows a slight difference:

## scale = FALSE
kknn::kknn(formula = mpg ~ disp + wt, train = training(mtcars_split), 
           test = testing(mtcars_split), k = 5, scale = FALSE) %>% 
  predict(newdata = testing(mtcars_split))
#> [1] 21.276 21.276 16.860 16.276 21.276 16.404 29.680 15.700 16.020

## scale = TRUE
kknn::kknn(formula = mpg ~ disp + wt, train = training(mtcars_split), 
           test = testing(mtcars_split), k = 5, scale = TRUE) %>% 
  predict(newdata = testing(mtcars_split))
#> [1] 21.032 21.784 16.668 16.052 21.264 16.404 26.340 16.076 15.620

The issue is that kknn::predict.train.kknn() only respects some of the parameters originally passed to train.kknn(), but not all. scale, na.action, ykernel and contrasts aren't passed along to kknn() inside kknn::predict.train.kknn().

A fix would involve parsing the $call entry of the train.kknn-object more carefully.

See also this SO question.

schoonees commented 3 years ago

Any thoughts on this?