CODAIT / r4ml

Scalable R for Machine Learning
Apache License 2.0
42 stars 13 forks source link

omit.na parameter not working properly in r4ml.ml.preprocess #46

Closed nilmeier closed 7 years ago

nilmeier commented 7 years ago

Preprocess method has an "omit.na" parameter that should take care of NULLs, but we have to run dropna to the output to get the code to work properly.

# omit.na should take care of na processing....
df_trans <- r4ml.ml.preprocess(
  df, transformPath="/tmp",
  recodeAttrs=c("UniqueCarrier", "TailNum", "Origin", "Dest"),
  omit.na=c("UniqueCarrier", "TailNum", "Origin", "Dest"))

# sample the dataset into the train and test
samples <- r4ml.sample(df_trans$data, perc=c(0.7, 0.3),seed=0)
train <- samples[[1]]
ignore <- cache(train)
test <- samples[[2]]
ignore <- cache(test)

# train the glm model by default it is the binomial
train_m <- as.r4ml.matrix(train)

## dropna should not be necessary here!
glm <- r4ml.glm(DepTime ~ .,as.r4ml.matrix(dropna(train_m)))
bdwyer2 commented 7 years ago

PR #47 merged