StatMixedML / LightGBMLSS

An extension of LightGBM to probabilistic modelling
https://statmixedml.github.io/LightGBMLSS/
Apache License 2.0
272 stars 28 forks source link

error when using weights in lgb.Dataset #13

Closed p-schaefer closed 1 year ago

p-schaefer commented 1 year ago

I'm getting an error when I try to use weights in lgb datasets:

from lightgbmlss.model import *
from lightgbmlss.distributions.Expectile import *
from lightgbmlss.datasets.data_loader import load_simulated_gaussian_data

import plotnine
from plotnine import *
plotnine.options.figure_size = (20, 10)

train, test = load_simulated_gaussian_data()

X_train, y_train = train.filter(regex="x"), train["y"].values
X_test, y_test = test.filter(regex="x"), test["y"].values

weight2 = train["scale"].values

dtrain = lgb.Dataset(X_train, label=y_train,weight=weight2)

lgblss = LightGBMLSS(
    Expectile(stabilization="None",              # Options are "None", "MAD", "L2".
              expectiles = [0.05, 0.95],         # List of expectiles to be estimated, in increasing order.
              penalize_crossing = True           # Whether to include a penalty term to discourage crossing of expectiles.
              )
)

param_dict = {
    "eta":                      ["float", {"low": 1e-5,   "high": 1,     "log": True}],
    "max_depth":                ["int",   {"low": 1,      "high": 10,    "log": False}],
    "num_leaves":               ["int",   {"low": 255,    "high": 255,   "log": False}],  # set to constant for this example
    "min_data_in_leaf":         ["int",   {"low": 20,     "high": 20,    "log": False}],  # set to constant for this example
    "min_gain_to_split":        ["float", {"low": 1e-8,   "high": 40,    "log": False}],
    "min_sum_hessian_in_leaf":  ["float", {"low": 1e-8,   "high": 500,   "log": True}],
    "subsample":                ["float", {"low": 0.2,    "high": 1.0,   "log": False}],
    "feature_fraction":         ["float", {"low": 0.2,    "high": 1.0,   "log": False}],
    "boosting":                 ["categorical", ["gbdt"]],
}

np.random.seed(123)
opt_param = lgblss.hyper_opt(param_dict,
                             dtrain,
                             num_boost_round=100,        # Number of boosting iterations.
                             nfold=5,                    # Number of cv-folds.
                             early_stopping_rounds=20,   # Number of early-stopping rounds
                             max_minutes=10,             # Time budget in minutes, i.e., stop study after the given number of minutes.
                             n_trials=None,              # The number of trials. If this argument is set to None, there is no limitation on the number of trials.
                             silence=False,              # Controls the verbosity of the trail, i.e., user can silence the outputs of the trail.
                             seed=123,                   # Seed used to generate cv-folds.
                             hp_seed=None                # Seed for random number generator used in the Bayesian hyperparameter search.
                             )

I get the following error:

[I 2023-06-06 22:28:24,167] A new study created in memory with name: LightGBMLSS Hyper-Parameter Optimization
/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/optuna/progress_bar.py:56: ExperimentalWarning: Progress bar is experimental (supported from v1.2.0). The interface can change in the future.
   0%|          | 00:00/10:00
[W 2023-06-06 22:28:24,552] Trial 0 failed with parameters: {'eta': 0.0007593032665095024, 'max_depth': 3, 'num_leaves': 255, 'min_data_in_leaf': 20, 'min_gain_to_split': 17.66639853363941, 'min_sum_hessian_in_leaf': 3.0862064424185954e-05, 'subsample': 0.3026741167578493, 'feature_fraction': 0.3579892008277689, 'boosting': 'gbdt'} because of the following error: ValueError('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()').
Traceback (most recent call last):
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbmlss/model.py", line 374, in objective
    lgblss_param_tuning = self.cv(hyper_params,
                          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbmlss/model.py", line 255, in cv
    self.bstLSS_cv = lgb.cv(params,
                     ^^^^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbm/engine.py", line 640, in cv
    cvfolds.update(fobj=fobj)
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbm/engine.py", line 353, in handler_function
    ret.append(getattr(booster, name)(*args, **kwargs))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbm/basic.py", line 3029, in update
    grad, hess = fobj(self.__inner_predict(0), self.train_set)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbmlss/distributions/distribution_utils.py", line 88, in objective_fn
    if data.get_weight() == None:
       ^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
[W 2023-06-06 22:28:24,553] Trial 0 failed with value None.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbmlss/model.py", line 404, in hyper_opt
    study.optimize(objective, n_trials=n_trials, timeout=60 * max_minutes, show_progress_bar=True)
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/optuna/study/study.py", line 425, in optimize
    _optimize(
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/optuna/study/_optimize.py", line 66, in _optimize
    _optimize_sequential(
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/optuna/study/_optimize.py", line 251, in _run_trial
    raise func_err
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbmlss/model.py", line 374, in objective
    lgblss_param_tuning = self.cv(hyper_params,
                          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbmlss/model.py", line 255, in cv
    self.bstLSS_cv = lgb.cv(params,
                     ^^^^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbm/engine.py", line 640, in cv
    cvfolds.update(fobj=fobj)
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbm/engine.py", line 353, in handler_function
    ret.append(getattr(booster, name)(*args, **kwargs))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbm/basic.py", line 3029, in update
    grad, hess = fobj(self.__inner_predict(0), self.train_set)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pschaefer/.local/share/r-miniconda/envs/AEC_Model/lib/python3.11/site-packages/lightgbmlss/distributions/distribution_utils.py", line 88, in objective_fn
    if data.get_weight() == None:
       ^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
StatMixedML commented 1 year ago

@p-schaefer Thanks for your interest in the project. I have fixed the weights check. Can you please re-install the package and re-try. Thanks.

p-schaefer commented 1 year ago

Yep, that fixed it. Thanks @StatMixedML !

neverfox commented 1 year ago

This appears to break things in the other direction. Now if the dataset doesn't have weights you get AttributeError: 'NoneType' object has no attribute 'all'. It should work for both cases if the code uses is rather than ==, i.e. if data.get_weight() is None:, as that will return False if weights exist, rather than an array of False. And is will return a simple True when the return value is actually None.