ModelOriented / treeshap

Compute SHAP values for your tree-based models using the TreeSHAP algorithm
https://modeloriented.github.io/treeshap/
GNU General Public License v3.0
78 stars 23 forks source link

ranger.unify fails on large models #13

Open yovizzle opened 3 years ago

yovizzle commented 3 years ago

Hi,

We are training a large random forest model (rf object size is ~270mb) on a large dataset (dim 1,670,000 x 267, object size 3.3gb) and are hitting errors. The machine tested on has 96 cpus/354Gb ram.

Here is a repro.

library(treeshap)
library(ranger)
library(tidyverse)

# Generate random training tibble of similar size to our data
m = matrix(nrow = 800000,ncol = 200,data = runif(n = 800000*200))
object.size(m)/1024^3 # 1.2 gb
trainM = m %>% as_tibble
srf <- ranger(V200 ~ ., data=trainM, num.trees = 5,verbose = TRUE)
object.size(srf)/1024^2 # 89.4 MB
rfu = treeshap::ranger.unify(srf, trainM)

We then got this error:

# *** caught segfault ***
#   address 0x55e43e173ed0, cause 'memory not mapped'
# 
# Traceback:
# 1: new_covers(x, is_na, roots, yes, no, missing, is_leaf, feature,     split, decision_type)
# 2: set_reference_dataset(ret, as.data.frame(data))
# 3: treeshap::ranger.unify(srf, trainM)
# An irrecoverable exception occurred. R is aborting now ...
# Segmentation fault (core dumped)

# R version 4.0.2 (2020-06-22)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 20.04 LTS

Any ideas as to what may be causing this issue? Is it a limitation of the current implementation of the package, or perhaps an issue related to our R environment?

Thanks.

yovizzle commented 3 years ago

@maksymiuks Any ideas how I might address this? Thanks again.

maksymiuks commented 3 years ago

@yovizzle I'm on my way to find a solution

yovizzle commented 3 years ago

@maksymiuks great, thank you.

On Wed, 21 Apr 2021, 3:29 am maksymiuks, @.***> wrote:

@yovizzle https://github.com/yovizzle I'm on my way to find a solution

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ModelOriented/treeshap/issues/13#issuecomment-823466832, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNURENULLOEISDRCZ6EXS3TJW2WVANCNFSM42UJBFQA .

yovizzle commented 3 years ago

@maksymiuks Any updates on this? We'd love to make use of this package!

maksymiuks commented 3 years ago

@yovizzle hi!

I've identified the problem with ranger.unify however I'll have time to rebuild it in the second part of June/early July. I'll keep you posted

yovizzle commented 3 years ago

Awesome, thanks for the update!

On Fri, 11 Jun 2021, 10:27 am maksymiuks, @.***> wrote:

@yovizzle https://github.com/yovizzle hi!

I've identified the problem with ranger.unify however I'll have time to rebuild it in the second part of June/early July. I'll keep you posted

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ModelOriented/treeshap/issues/13#issuecomment-859174065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNURELVZ675U4I3MGEWLF3TSFJ57ANCNFSM42UJBFQA .

yovizzle commented 2 years ago

Hi @maksymiuks , just checking back to see how this is looking.