JeffreyRacine / R-Package-np

R package np (Nonparametric Kernel Smoothing Methods for Mixed Data Types)
https://socialsciences.mcmaster.ca/people/racinej
47 stars 18 forks source link

using npplregbw with stored formula causes unexpected behaviour #26

Open Bijaelo opened 3 years ago

Bijaelo commented 3 years ago

Based on this SO question.

There's currently a problem when providing a formula in any non-standard way. Either in a stored variable or as a result of dynamic call. Illustration:

library(np)
df<-data.frame(y=rnorm(10),x1=rnorm(10),x2=rnorm(10),x3=rnorm(10),x4=rnorm(10),
               x5=rnorm(10))
npplregbw(formula = as.formula(paste0(paste("y ~ ", paste0('x', 1:4, collapse= " + ")), '|x5')), data = df)
fml <- y ~ x1 + x2 + x3 + x4 | x5
npplregbw(formula = fml, data = df)

The error comes from the way the formula is extracted. And potentially this affects more than just the formula:

foo <- function(formula){
  mf <- match.call(expand.dots = FALSE)
  return mf[["formula"]]
}
foo(as.formula(paste0(paste("y ~ ", paste0('x', 1:4, collapse= " + ")), '|x5')))
# output
as.formula(paste0(paste("y ~ ", paste0("x", 1:4, 
    collapse = " + ")), "|x5"))

The problem in this case is within explodePipe and np:::npplregbw.formula. Since the formula is given as an unevaluated call. This might be a problem in other parts of the function as well. A quick fix would be to force evaluation of the formula in explodePipe

np:::explodePipe
function (formula) 
{
    tf <- as.character(eval(formula)) # <== added eval
    tf <- tf[length(tf)]
    eval(parse(text = paste("c(", ifelse(length(as.character(formula)) == 
        3, "strsplit(as.character(formula)[2],\" *[+] *\"),", 
        ""), "strsplit(strsplit(tf,\" *[|] *\")[[1]],\" *[+] *\"))")))
}

Note that extracting arguments from match.call without evaluating them may be the cause of further bugs but I have not tested this.

cwolock commented 2 years ago

I have had the same bug using npcdist. The issue happens with a stored formula object or a dynamic call for that function as well. Possibly related (although I have not looked into it in detail) is an error thrown when using the ~. formula syntax.

library(np)
df<-data.frame(y=rnorm(10),x1=rnorm(10),x2=rnorm(10),x3=rnorm(10),x4=rnorm(10),
               x5=rnorm(10))
npcdist(formula = y ~ ., data = df)

On the other hand, I have been able to use a formula object with npreg.