TomerRonen34 / treeboost_autograd

Easy Custom Losses for Tree Boosters using Pytorch
MIT License
28 stars 6 forks source link

CatboostRegressor and custom loss error #1

Open segalinc opened 1 year ago

segalinc commented 1 year ago

Hi there, I am trying to use your tool to create a Spearman R custom loss to use for CatBoostRegressor. However I get that this error regarding calculate_derivatives

To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
_catboost.pyx in _catboost._ObjectiveCalcDersRange()

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py in calc_ders_range(self, preds, targets, weights)
     40                         ) -> List[Tuple[float, float]]:
---> 41         deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
     42         result = list(zip(deriv1, deriv2))

AttributeError: 'CatboostObjective' object has no attribute 'calculate_derivatives'

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
_catboost.pyx in _catboost._ObjectiveCalcDersRange()

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py in calc_ders_range(self, preds, targets, weights)
     40                         ) -> List[Tuple[float, float]]:
---> 41         deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
     42         result = list(zip(deriv1, deriv2))

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in calculate_derivatives(self, preds, targets, weights)
     24         objective = self.sign * self.loss_function(preds, targets)
---> 25         deriv1, deriv2 = self._calculate_derivatives(objective, preds)
     26 

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in _calculate_derivatives(objective, preds)
     34     def _calculate_derivatives(objective: Tensor, preds: Tensor) -> Tuple[np.ndarray, np.ndarray]:
---> 35         deriv1, = torch.autograd.grad(objective, preds, create_graph=True)
     36 

/apps/python3/lib/python3.7/site-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    227         outputs, grad_outputs_, retain_graph, create_graph,
--> 228         inputs, allow_unused, accumulate_grad=False)
    229 

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
_catboost.pyx in _catboost._ObjectiveCalcDersRange()

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py in calc_ders_range(self, preds, targets, weights)
     40                         ) -> List[Tuple[float, float]]:
---> 41         deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
     42         result = list(zip(deriv1, deriv2))

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in calculate_derivatives(self, preds, targets, weights)
     24         objective = self.sign * self.loss_function(preds, targets)
---> 25         deriv1, deriv2 = self._calculate_derivatives(objective, preds)
     26 

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in _calculate_derivatives(objective, preds)
     34     def _calculate_derivatives(objective: Tensor, preds: Tensor) -> Tuple[np.ndarray, np.ndarray]:
---> 35         deriv1, = torch.autograd.grad(objective, preds, create_graph=True)
     36 

/apps/python3/lib/python3.7/site-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    227         outputs, grad_outputs_, retain_graph, create_graph,
--> 228         inputs, allow_unused, accumulate_grad=False)
    229 

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
_catboost.pyx in _catboost._ObjectiveCalcDersRange()

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py in calc_ders_range(self, preds, targets, weights)
     40                         ) -> List[Tuple[float, float]]:
---> 41         deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
     42         result = list(zip(deriv1, deriv2))

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in calculate_derivatives(self, preds, targets, weights)
     24         objective = self.sign * self.loss_function(preds, targets)
---> 25         deriv1, deriv2 = self._calculate_derivatives(objective, preds)
     26 

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in _calculate_derivatives(objective, preds)
     34     def _calculate_derivatives(objective: Tensor, preds: Tensor) -> Tuple[np.ndarray, np.ndarray]:
---> 35         deriv1, = torch.autograd.grad(objective, preds, create_graph=True)
     36 

/apps/python3/lib/python3.7/site-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    227         outputs, grad_outputs_, retain_graph, create_graph,
--> 228         inputs, allow_unused, accumulate_grad=False)
    229 

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
_catboost.pyx in _catboost._ObjectiveCalcDersRange()

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py in calc_ders_range(self, preds, targets, weights)
     40                         ) -> List[Tuple[float, float]]:
---> 41         deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
     42         result = list(zip(deriv1, deriv2))

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in calculate_derivatives(self, preds, targets, weights)
     24         objective = self.sign * self.loss_function(preds, targets)
---> 25         deriv1, deriv2 = self._calculate_derivatives(objective, preds)
     26 

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in _calculate_derivatives(objective, preds)
     34     def _calculate_derivatives(objective: Tensor, preds: Tensor) -> Tuple[np.ndarray, np.ndarray]:
---> 35         deriv1, = torch.autograd.grad(objective, preds, create_graph=True)
     36 

/apps/python3/lib/python3.7/site-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    227         outputs, grad_outputs_, retain_graph, create_graph,
--> 228         inputs, allow_unused, accumulate_grad=False)
    229 

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
_catboost.pyx in _catboost._ObjectiveCalcDersRange()

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py in calc_ders_range(self, preds, targets, weights)
     40                         ) -> List[Tuple[float, float]]:
---> 41         deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
     42         result = list(zip(deriv1, deriv2))

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in calculate_derivatives(self, preds, targets, weights)
     24         objective = self.sign * self.loss_function(preds, targets)
---> 25         deriv1, deriv2 = self._calculate_derivatives(objective, preds)
     26 

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in _calculate_derivatives(objective, preds)
     34     def _calculate_derivatives(objective: Tensor, preds: Tensor) -> Tuple[np.ndarray, np.ndarray]:
---> 35         deriv1, = torch.autograd.grad(objective, preds, create_graph=True)
     36 

/apps/python3/lib/python3.7/site-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    227         outputs, grad_outputs_, retain_graph, create_graph,
--> 228         inputs, allow_unused, accumulate_grad=False)
    229 

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
_catboost.pyx in _catboost._ObjectiveCalcDersRange()

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py in calc_ders_range(self, preds, targets, weights)
     40                         ) -> List[Tuple[float, float]]:
---> 41         deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
     42         result = list(zip(deriv1, deriv2))

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in calculate_derivatives(self, preds, targets, weights)
     24         objective = self.sign * self.loss_function(preds, targets)
---> 25         deriv1, deriv2 = self._calculate_derivatives(objective, preds)
     26 

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in _calculate_derivatives(objective, preds)
     34     def _calculate_derivatives(objective: Tensor, preds: Tensor) -> Tuple[np.ndarray, np.ndarray]:
---> 35         deriv1, = torch.autograd.grad(objective, preds, create_graph=True)
     36 

/apps/python3/lib/python3.7/site-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    227         outputs, grad_outputs_, retain_graph, create_graph,
--> 228         inputs, allow_unused, accumulate_grad=False)
    229 

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
_catboost.pyx in _catboost._ObjectiveCalcDersRange()

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py in calc_ders_range(self, preds, targets, weights)
     40                         ) -> List[Tuple[float, float]]:
---> 41         deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
     42         result = list(zip(deriv1, deriv2))

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in calculate_derivatives(self, preds, targets, weights)
     24         objective = self.sign * self.loss_function(preds, targets)
---> 25         deriv1, deriv2 = self._calculate_derivatives(objective, preds)
     26 

/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py in _calculate_derivatives(objective, preds)
     34     def _calculate_derivatives(objective: Tensor, preds: Tensor) -> Tuple[np.ndarray, np.ndarray]:
---> 35         deriv1, = torch.autograd.grad(objective, preds, create_graph=True)
     36 

/apps/python3/lib/python3.7/site-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    227         outputs, grad_outputs_, retain_graph, create_graph,
--> 228         inputs, allow_unused, accumulate_grad=False)
    229 

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

During handling of the above exception, another exception occurred:

CatBoostError                             Traceback (most recent call last)
/tmp/ipykernel_22901/1472145058.py in <module>
----> 1 catb_model_muse = catbr_reg(data_m, catb_params, custom_objective, custom_eval_metric, plot=True)

/tmp/ipykernel_22901/98406515.py in catbr_reg(data, params, custom_objective, custom_eval_metric, plot)
      9 
     10     xgbr_model = CatBoostRegressor(**params)
---> 11     xgbr_model.fit(X_train, y_train, eval_set=[(X_val, y_val)],plot=True)
     12 
     13     test_score = xgbr_model.score(X_test, y_test)

/apps/python3/lib/python3.7/site-packages/catboost/core.py in fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
   5591                          use_best_model, eval_set, verbose, logging_level, plot, column_description,
   5592                          verbose_eval, metric_period, silent, early_stopping_rounds,
-> 5593                          save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
   5594 
   5595     def predict(self, data, prediction_type=None, ntree_start=0, ntree_end=0, thread_count=-1, verbose=None, task_type="CPU"):

/apps/python3/lib/python3.7/site-packages/catboost/core.py in _fit(self, X, y, cat_features, text_features, embedding_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
   2281                 params,
   2282                 allow_clear_pool,
-> 2283                 train_params["init_model"]
   2284             )
   2285 

/apps/python3/lib/python3.7/site-packages/catboost/core.py in _train(self, train_pool, test_pool, params, allow_clear_pool, init_model)
   1703 
   1704     def _train(self, train_pool, test_pool, params, allow_clear_pool, init_model):
-> 1705         self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
   1706         self._set_trained_model_attributes()
   1707 

_catboost.pyx in _catboost._CatBoost._train()

_catboost.pyx in _catboost._CatBoost._train()

CatBoostError: catboost/python-package/catboost/helpers.cpp:42: Traceback (most recent call last):
  File "_catboost.pyx", line 1399, in _catboost._ObjectiveCalcDersRange
  File "/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py", line 41, in calc_ders_range
    deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
AttributeError: 'CatboostObjective' object has no attribute 'calculate_derivatives'

Code of loss function

import torch
from fast_soft_sort.pytorch_ops import soft_rank

def corrcoef(target, pred):
    # np.corrcoef in torch from @mdo
    # https://forum.numer.ai/t/custom-loss-functions-for-xgboost-using-pytorch/960
    pred_n = pred - pred.mean()
    target_n = target - target.mean()
    pred_n = pred_n / pred_n.norm()
    target_n = target_n / target_n.norm()
    return (pred_n * target_n).sum()

def spearman(target,  pred,   regularization="l2", regularization_strength=1.0):
    pred = soft_rank(pred, regularization=regularization, regularization_strength=regularization_strength)
    return corrcoef(target, pred / pred.shape[-1])

def spearman_loss(ypred, ytrue):
    lenypred = ypred.shape[0]
    lenytrue = ytrue.shape[0]

    ypred_th = torch.tensor(ypred.reshape(1, lenypred), requires_grad=True)
    ytrue_th = torch.tensor(ytrue.reshape(1, lenytrue))

    loss = spearman(ytrue_th, ypred_th, regularization_strength=3)
    # print(f'Current loss:{loss}')

    # calculate gradient and convert to numpy
    loss_grads = torch.autograd.grad(loss, ypred_th)[0]
    loss_grads = loss_grads.detach().numpy()

    # return gradient and ones instead of Hessian diagonal
    return loss_grads[0], np.ones(loss_grads.shape)[0]

custom_objective = CatboostObjective(loss_function=spearman_loss)
TomerRonen34 commented 1 year ago

Hi Cristina, CatboostObjective (and its colleagues LightGbmObjective and XgboostObjective) calculates the gradients for you - your loss function should return the scalar torch tensor loss, not numpy grads. However, it looks like you're using the package for multiregression, which is a use case that I didn't consider. I'll try to create a working example :)

segalinc commented 1 year ago

Hi Tomer, thank you for the quick reply. Not sure where it says I am using the multiregression. My target is a simple regression Maybe I misunderstood the need of computing the gradient I just need to stop at the spearman computation and return that coeff

segalinc commented 1 year ago

I just tried that and I get the same error

TomerRonen34 commented 1 year ago

I debugged it and got the same error. This is the line that causes it: ypred_th = torch.tensor(ypred.reshape(1, lenypred), requires_grad=True) Your loss function receives ypred as a tensor with requires_grad=True. When you create a different tensor, and use the new tensor to calculate the loss, the original ypred isn't involved in the computation graph anymore, and thus doesn't get gradients. You can fix it by changing this line to ypred_th = ypred.reshape(1, lenypred).

On a different note, if you're using correlation as the loss for tree booster regression, you may get unexpected results because of the "divide and conquer" mechanism of trees. If you print the length of ypred inside spearman_loss, you'll see that the number of examples isn't constant - this is because CatBoost calls the loss function for every node inside every tree, which means the loss function only operates on a subset of the dataset every time. I think that correlation can be a problematic measure of similarity for small sample sizes. Maybe it would work better if you use a large amount of very shallow trees.

Also, you probably want to use loss = -spearman(...) since in treeboost_autograd the loss is being minimized.

Hope this helps!

segalinc commented 1 year ago

Hi Tomer, thanks for the hint, really appreciated My dataset is very big so no big issue on that side. You are right I forgot to do 1- coeff as actual loss, I saw that one while I was re-checking what I posted here. I will try the fix and see if this works better and in case close the issue

segalinc commented 1 year ago

Hi Tomer, I was trying your example in the repo and even with the fix I get a new error The same error happens also if a use the absolute_error_loss from your blog post


  File "_catboost.pyx", line 1399, in _catboost._ObjectiveCalcDersRange
  File "/apps/python3/lib/python3.7/site-packages/treeboost_autograd/booster_objectives.py", line 41, in calc_ders_range
    deriv1, deriv2 = self.calculate_derivatives(preds, targets, weights)
  File "/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py", line 25, in calculate_derivatives
    deriv1, deriv2 = self._calculate_derivatives(objective, preds)
  File "/apps/python3/lib/python3.7/site-packages/treeboost_autograd/pytorch_objective.py", line 35, in _calculate_derivatives
    deriv1, = torch.autograd.grad(objective, preds, create_graph=True)
  File "/apps/python3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 228, in grad
    inputs, allow_unused, accumulate_grad=False)
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn```
segalinc commented 1 year ago

Seems like the custom loss function want as arguments (preds:Tensor, target:Tensor) to work

TomerRonen34 commented 1 year ago

Seems like the custom loss function want as arguments (preds:Tensor, target:Tensor) to work

That's true - the custom loss should expect its inputs to be torch tensors. This is what the implementation of CatboostObjective does: convert numpy.ndarray to torch.Tensor, call the loss, calculate 1st and 2nd order grads, convert back from torch.Tensor to numpy.ndarray.

Tell me if it still doesn't work, and I'll try to create a full working example with your loss.

segalinc commented 1 year ago

I worked a bit more and yeah still not working unfortunately. I might be missing something in your repo. I appreciate the help :)