SteffenMoritz / ridge

CRAN R Package: Ridge Regression with automatic selection of the penalty parameter
GNU General Public License v2.0
18 stars 11 forks source link

predict.linearRidge does not work with factors #3

Closed dfrankow closed 4 years ago

dfrankow commented 4 years ago

Here is an example:

library(ridge)

foo <- mtcars
foo$cyl_factor <- as.factor(paste0("cyl", foo$cyl))
model1 <- lm(mpg ~ wt + cyl_factor, data = foo)
model2 <- linearRidge(mpg ~ wt + cyl_factor, data = foo, lambda = 0.0)

predict(model1)
# has output..
predict(model2)
Error in as.matrix(mm) %*% beta : 
  requires numeric/complex matrix/vector arguments

It's because cyl_factor breaks out into multiple columns, while mm has only one column for that factor. In predict.linearRidge, right before res <- drop(as.matrix(mm) %*% beta:

Browse[2]> names(mm)
[1] "1"          "wt"         "cyl_factor"
Browse[2]> names(beta)
[1] "(Intercept)"    "wt"             "cyl_factorcyl6" "cyl_factorcyl8"
SteffenMoritz commented 4 years ago

Thanks for opening this issue @dfrankow 👍 It's been a while since I have been looking into the package.

It sounds like you have given this some thought - do you already have a reasonable fix in mind? - which ideally does not break something else ;) (can this even be fixed?)

Should you have a fix already, I'd very much appreciate it if you can open a pull request. I'd merge on Github and update the CRAN version afterwards. (of course you can add yourself to the contributors in the package description file for any updates you make)

dfrankow commented 4 years ago

I have no fix in mind.

I'll try to take a look at predict.lm or vcov.lm to see how it handles factors.

dfrankow commented 4 years ago

I believe I have a fix in my repo. I also added vcov.linearRidge and some testthat tests. However, I want to get it reviewed before making a pull request.

dfrankow commented 4 years ago

Filed https://github.com/cran/ridge/pull/1 for this, but I want my colleague to review it first.

SteffenMoritz commented 4 years ago

Ah this would be great, if your colleague also has a look over it!

Small hint, you must file the Pull request to https://github.com/SteffenMoritz/ridge/ (the cran/ridge is read only)

Then I have all the changes and can package it and submit it to CRAN. (https://cran.r-project.org/submit.html)

Then CRAN updates the cran/ridge repo and more important all other download possibilities.

Additional note: In the DESCRIPTION file not only change the version number, also add yourself to Authors@R + Author to also appear as contributor on CRAN (https://cran.r-project.org/web/packages/ridge/index.html)

Like this (but with FIRST and LAST replaced by your name :) ):

Authors@R: c( person("Steffen", "Moritz", email="steffen.moritz10@gmail.com", role=c("aut", "cre"), comment = c(ORCID = "0000-0002-0085-1804")),person("Erika", "Cule", role=c("aut")), person("FIRST","LAST",role=c("ctb") )

Author: Steffen Moritz [aut, cre] (https://orcid.org/0000-0002-0085-1804), Erika Cule [aut], FIRST LAST [ctb]

dfrankow commented 4 years ago

I filed https://github.com/SteffenMoritz/ridge/pull/4 for this.

SteffenMoritz commented 4 years ago

Looks good, thanks for your effort ! I took a quick look and merged this now. I'll do some further tests tonight and then I'll update on CRAN :)

dfrankow commented 4 years ago

Sounds good.

If you have time, I'd suggest putting tests you do in the testthat tests to be run automatically.

dfrankow commented 4 years ago

Even after #4 is merged, I found another test that fails: using predict with newdata.

test_that("test linearRidge predict method for lambda = 0, with a factor and newdata", {
  foo <- mtcars
  foo$cyl_factor <- as.factor(paste0("cyl", foo$cyl))
  model1 <- lm(mpg ~ wt + cyl_factor, data = foo)
  model2 <- linearRidge(mpg ~ wt + cyl_factor, data = foo, lambda = 0.0)

  # predict agrees between model1 and model2 when lambda is 0, with factor
  newdata <- data.frame(wt=c(1.0), cyl_factor=c("cyl4"))
  preds1 <- predict(model1, newdata)
  preds2 <- predict(model2, newdata)
  expect_equal(preds1, preds2, tolerance = tol, label = "predictions")
})

I'm working on it.

dfrankow commented 4 years ago

Filed https://github.com/SteffenMoritz/ridge/pull/6

dfrankow commented 4 years ago

Found another case that doesn't work: passing in a formula object.

test_that("test linearRidge with formula variable", {
  the_formula <- formula('mpg ~ wt + cyl')
  model1 <- lm(the_formula, data = mtcars)
  model2 <- linearRidge(the_formula, data = mtcars, lambda = 0)

  expect_equal(coef(model1), coef(model2), tolerance = tol, label = "coefficients")
  expect_equal(predict(model1), predict(model2), tolerance = tol, label = "predictions")
})

This doesn't have to do with factors, but it may have to do with my modifications. I'm not sure yet.

dfrankow commented 4 years ago

Never mind, false alarm. I had interfering objects in my environment.

SteffenMoritz commented 4 years ago

I merged everything related to this issue now :) Unfortunately I've had some issues with my Mac - had to reinstall xcode to be able to build the package.

So I've not been able to run tests yet (by the way adding these to testthat is a very good remark)

Good news: build is successful for me. 0 errors ✔ | 0 warnings ✔ | 1 note ✖

Upload to CRAN usually requires 0 notes. But I think we should be able to fix this quite easy.

Here is the note I am getting (on Mac OS): (but it is the same on travis-ci https://travis-ci.org/github/SteffenMoritz/ridge)


Luckily the solution is already in the note :) (we can just manually add this to the NAMESPACE file, since for this package the NAMESPACE file is not created by roxygen)

SteffenMoritz commented 4 years ago

By the way, I submitted the new version to CRAN now. Could be already online in a few hours - if everything goes alright.

Many thanks to you for this update of ridge! ( it was basically all made by you 👍 )

dfrankow commented 4 years ago

Thanks. Your responsiveness was very helpful.

dfrankow commented 4 years ago

Since this now works, I'll close this issue.

Also, this code runs:

library(testthat)
expect_equal(predict(model1), predict(model2))
expect_equal(coef(model1), coef(model2))
SteffenMoritz commented 4 years ago

Dear maintainer,

thanks, package ridge_2.5.tar.gz is on its way to CRAN.

Best regards, CRAN teams' auto-check service Package check result: OK

No changes to worse in reverse depends.

Everything successful :) Some of the new versions are already build: https://cran.r-project.org/web/checks/check_results_ridge.html