lightgbm.unify example erroring. #17

Closed nspyrison closed 1 year ago

nspyrison commented 2 years ago

Following example in ?lightgbm.unify


#> Loading required package: R6
param_lgbm <- list(objective = "regression", max_depth = 2,  force_row_wise = TRUE)
data_fifa <- fifa20$data[!colnames(fifa20$data) %in%
                           c('work_rate', 'value_eur', 'gk_diving', 'gk_handling',
                             'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')]
data <- na.omit(cbind(data_fifa, fifa20$target))
sparse_data <- as.matrix(data[,-ncol(data)])
x <- lightgbm::lgb.Dataset(sparse_data, label = as.matrix(data[,ncol(data)]))
lgb_data <- lightgbm::lgb.Dataset.construct(x)
lgb_model <- lightgbm::lightgbm(data = lgb_data, params = param_lgbm, save_name = "", verbose = 0)
#> Error in bst$save_model(filename = save_name): Model file  is not available for writes
# unified_model <- lightgbm.unify(lgb_model, sparse_data)
# shaps <- treeshap(unified_model, data[1:2, ])
# plot_contribution(shaps, obs = 1)
nspyrison commented 2 years ago

Working example combining treeshap and lightgbm examples, maybe something like:

#> Loading required package: R6

param_lgbm <- list(objective = "regression", max_depth = 2,  force_row_wise = TRUE)
data_fifa  <- fifa20$data[!colnames(fifa20$data) %in%
                            c('work_rate', 'value_eur', 'gk_diving', 'gk_handling',
                              'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')]

## lightgbm on sparse matrix:
lgbm_params <- list(
  num_leaves = 4L, learning_rate = 1.0, objective = "binary", nthread = 2L)
fit <- lightgbm(data = as.matrix(data_fifa), params = lgbm_params,
                label = fifa20$target, nrounds = 2L)
#> [LightGBM] [Info] Number of positive: 18028, number of negative: 250
#> [LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.004856 seconds.
#> You can set `force_col_wise=true` to remove the overhead.
#> [LightGBM] [Info] Total Bins 3441
#> [LightGBM] [Info] Number of data points in the train set: 18278, number of used features: 48
#> [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.986322 -> initscore=4.278220
#> [LightGBM] [Info] Start training from score 4.278220
#> [1] "[1]:  train's binary_logloss:0.100651"
#> [1] "[2]:  train's binary_logloss:0.0992022"

## lightgbm on dense matrix:
dat <- na.omit(cbind(data_fifa, fifa20$target))
fit <- lightgbm(data = as.matrix(dat[, -ncol(dat)]), params = lgbm_params,
                label = dat[, ncol(dat)], nrounds = 2L)
#> [LightGBM] [Info] Number of positive: 16032, number of negative: 210
#> [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001164 seconds.
#> You can set `force_row_wise=true` to remove the overhead.
#> And if memory is not enough, you can set `force_col_wise=true`.
#> [LightGBM] [Info] Total Bins 2956
#> [LightGBM] [Info] Number of data points in the train set: 16242, number of used features: 48
#> [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.987071 -> initscore=4.335234
#> [LightGBM] [Info] Start training from score 4.335234
#> [1] "[1]:  train's binary_logloss:0.0975157"
#> [1] "[2]:  train's binary_logloss:0.0976132"

krzyzinskim commented 1 year ago

Thanks! I see that it was already fixed in #25