NorskRegnesentral / shapr

Explaining the output of machine learning models with more accurately estimated Shapley values
https://norskregnesentral.github.io/shapr/
Other
141 stars 33 forks source link

Please use [ instead of merge if you need nomatch arg #322

Closed tdhock closed 1 year ago

tdhock commented 1 year ago

hi @andersloland @martinju @aredelmeier @nikolase90 When data.table from github master is installed, data.table::update_dev_pkg() There are new test failures when checking shapr, https://github.com/Rdatatable/data.table/issues/5544 These test failures come from a new warning that the nomatch argument is ignored, when you use it here: https://github.com/NorskRegnesentral/shapr/blob/3f4f61e90056839baec90c301015bc730d7cd8bd/R/models.R#L467

  DT <- merge(DT_get_model_specs, DT_predict_model, by = "rn", all = T, allow.cartesian = T, nomatch = 0)

A fix would be to use [ instead of merge:

DT <- DT_get_model_specs[DT_predict_model, on="rn", allow.cartesian=TRUE, nomatch=0L]

Another fix would be to keep merge and remove the nomatch arg:

  DT <- merge(DT_get_model_specs, DT_predict_model, by = "rn", all = TRUE, allow.cartesian = TRUE)

Please update your example code, then run R CMD check shapr with data.table master installed. For reference this is the current error I am seeing using shapr from CRAN and data.table master from github,

* checking tests ...
  Running 'testthat.R'
 ERROR
Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:
  `shapr(train_df_used_numeric, l_numeric[[i]])` produced warnings.
  -- Failure ('test-a-shapley.R:87'): Testing data input to shapr in shapley.R ---
  `shapr(train_df_used_numeric, l_numeric[[i]])` produced warnings.
  -- Failure ('test-a-shapley.R:92'): Testing data input to shapr in shapley.R ---
  `shapr(train_df_used_factor, l_factor[[i]])` produced warnings.
  -- Failure ('test-a-shapley.R:92'): Testing data input to shapr in shapley.R ---
  `shapr(train_df_used_factor, l_factor[[i]])` produced warnings.
  -- Failure ('test-a-shapley.R:92'): Testing data input to shapr in shapley.R ---
  `shapr(train_df_used_factor, l_factor[[i]])` produced warnings.
  -- Failure ('test-a-shapley.R:92'): Testing data input to shapr in shapley.R ---
  `shapr(train_df_used_factor, l_factor[[i]])` produced warnings.

  [ FAIL 7 | WARN 45 | SKIP 0 | PASS 637 ]
  Error: Test failures
  Execution halted

After you update your code and R CMD check has no more errors, please submit a new version of shapr to CRAN. This will help facilitate releasing a new version of data.table to CRAN. (data.table devs must make sure all reverse dependencies do not break, before submitting a new version to CRAN)

martinju commented 1 year ago

Thanks for reporting this. It should be an easy fix. What is your expected timeline for your update?

tdhock commented 1 year ago

It would be great if you could provide a CRAN update in the next week or two.

tdhock commented 1 year ago

hi again @martinju ! Do you have any idea when you will be able to provide a CRAN update that fixes this issue? Thanks Toby

martinju commented 1 year ago

Hi, sorry, totally forgot about this. The plan was to fix this in an upcoming release, but it is taking longer than expected.

I will submit a fix of only this issue to CRAN within a few days.

martinju commented 1 year ago

@tdhock Sorry for the delay. Just submitted patch to CRAN. Will let you know how when its accepted.

tdhock commented 1 year ago

hi @martinju thanks very much, I see the issue is now fixed on our end.