Open MichaelChirico opened 4 years ago
NOTES: Non-maintainer report on contributed package User error?
Audit (from Jitterbug): Fri Jun 10 12:47:29 2005 ripley changed notes Fri Jun 10 12:47:29 2005 ripley moved from incoming to Add-ons Sat Aug 9 21:02:21 2008 ripley changed notes
This is a contributed package. I left a note in the issue tracker for the package. I think it can be closed.
This is from a contributed package. I think it can be closed (I notified the maintainer.)
From: wheelerb@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::> Full_Name: Bill Wheeler Version: 2.0.1 OS: Windows 2000 Submission from: (NULL) (67.130.36.229)
The program to reproduce the error is below. I am calling rpart with a user-defined split function for a binary response variable and one continuous independent variable. The split function works for some datasets but not others. The error is: Error in "$<-.data.frame"(
*tmp*
, "yval2", value = c(0, 15, 10, 0.6, : replacement has 5 rows, data has 1# # Test out the "user mode" functions, with a binary response #
rm(list=ls(all=TRUE))
options(warn = 1); library(rpart);
set.seed(7);
nobs <- 25; mydata <- data.frame(indx=1:nobs); mydata[, "y"] <- floor(runif(nobs, min=0, max=2)); mydata[, "x"] <- runif(nobs, min=0, max=2); mydata$indx <- NULL;
################################################################ # The 'evaluation' function. Called once per node. # Produce a label (1 or more elements long) for labeling each node, # and a deviance. The latter is # - of length 1 # - equal to 0 if the node is "pure" in some sense (unsplittable) # - does not need to be a deviance: any measure that gets larger # as the node is less acceptable is fine. # - the measure underlies cost-complexity pruning, however temp1 <- function(y, wt, parms) { print(" START: TEMP1 ");
# The split function, where most of the work occurs. # Called once per split variable per node. # If continuous=T # The actual x variable is ordered # y is supplied in the sort order of x, with no missings, # return two vectors of length (n-1): # goodness = goodness of the split, larger numbers are better. # 0 = couldn't find any worthwhile split # the ith value of goodness evaluates splitting obs 1:i vs (i+1):n # direction= -1 = send "y< cutpoint" to the left side of the tree # 1 = send "y< cutpoint" to the right # this is not a big deal, but making larger "mean y's" move towards # the right of the tree, as we do here, seems to make it easier to # read # If continuos=F, x is a set of integers defining the groups for an # unordered predictor. In this case: # direction = a vector of length m= "# groups". It asserts that the # best split can be found by lining the groups up in this order # and going from left to right, so that only m-1 splits need to # be evaluated rather than 2^(m-1) # goodness = m-1 values, as before. # # The reason for returning a vector of goodness is that the C routine # enforces the "minbucket" constraint. It selects the best return value # that is not too close to an edge. temp2 <- function(y, wt, x, parms, continuous) { print(" START: TEMP2 ");
# The init function: # fix up y to deal with offsets # return a dummy parms list # numresp is the number of values produced by the eval routine's "label" # numy is the number of columns for y # summary is a function used to print one line in summary.rpart # text is a function used to put text on the plot in text.rpart # In general, this function would also check for bad data, see rpart.poisson # for instace. temp3 <- function(y, offset, parms, wt) { print(" START: TEMP3 ");
alist <- list(eval=temp1, split=temp2, init=temp3);
fit1 <- rpart(y∼ ., data=mydata, method=alist, control=list(cp=0));
METADATA