Evovest / EvoTrees.jl

Boosted trees in Julia
https://evovest.github.io/EvoTrees.jl/dev/
Apache License 2.0
177 stars 21 forks source link

Using EvoTrees.jl for multi-target regression problems #217

Open fipelle opened 1 year ago

fipelle commented 1 year ago

Hi,

Q1: I have seen that this package supports multi-class problems. I was wondering if there is also a way to use it for multi-target regression problems. For instance, if you would like to predict two variables and using some multivariate squared error loss (e.g., the average MSE over the targets). I have tried setting y_train to be a Vector{Vector{Float64}} but it errors out in fit.jl:53 using:

config = EvoTreeRegressor(
    loss=:linear, 
    nrounds=100, 
    nbins=100,
    lambda=0.5, 
    gamma=0.1, 
    eta=0.1, 
    max_depth=6, 
    min_weight=1.0, 
    rowsample=0.5, 
    colsample=1.0);

m = fit_evotree(config; x_train=predictors, y_train=targets)

Q2: Is it possible to use custom loss functions for multi-target regression problems provided that they are twice differentiable?

Thanks!

fipelle commented 1 year ago

This is an example with random forests that shows something similar in ScikitLearn.

jeremiedb commented 1 year ago

Hello!

  1. Multi-output: there's unfortunately no such multi-output support in place at the moment, so it's expected that passing a vector of y targets would faill (although it also signals improved assertion could be helpful in guiding users in their usage). It wouldn't consider this as trivial, but I think that such support could be reasonably implemented for simple "single target" loss functions such as "linear", "logisitic" and the likes. I haven't encountered the need for such multi-target so far, could be elaborate a bit on the features requirement? For instance, is there a need to for a weighted loss of each of the target or is straight average / sum sufficient?

  2. Custom loss function: there's no direct support for such, at least in the form of an API as XGBoost or LightGBM provides, even for the regular, single target objective. However, part of the interest for working in Julia is that the codebase is quite lightweight and adding a custom loss function is fairly trivial. If there's a loss function you'd like to see added, it's likely possible to integrate it in the library. Then, having it available for multi-target would depend on the development on Q1.

fipelle commented 1 year ago

Hi,

I think that in general allowing for weighted losses would be better. In my case, I would need a simple average. In terms of use cases, there may be situations in which you'd like to predict a series of targets from the same set of features and model. For instance, this is somewhat common in economics and finance.