StatMixedML / CatBoostLSS

An extension of CatBoost to probabilistic modelling
MIT License
141 stars 13 forks source link

Any intention of doing this with lightGBM #5

Closed javier-cazana closed 4 years ago

javier-cazana commented 4 years ago

Just wondering :)

StatMixedML commented 4 years ago

Good suggestion. Since XGBoost and LightGBM are very similar in their architecture, I have also been trying to implement LightGBMLSS. It is working, but I haven`t figured out how to properly select LightGBM's hyperparameter as performance is way off compared to XGBoost. But maybe I need to re-visit the concept

jrzaurin commented 4 years ago

Hey @StatMixedML

normally when I use LightGBM the defaults work relatively well for most cases. In fact superior to XGBoost and Catboost defaults. The reason to ask was because most of our algorithms in production rely one way or another on LightGBM (is just so fast and robust) optimised now with Optuna, and we use Quantile Regression to get confidence intervals. I would be more than happy to replace it with a proper probabilistic approach πŸ™‚.

StatMixedML commented 4 years ago

@jrzaurin See also LightGBMLSS and ProbBoost. I was planning to do it anyways, so thanks for the issue.

I need to get working on that soon :-)

jrzaurin commented 4 years ago

@StatMixedML you are a heroπŸ˜„

Looking forward to see the progress (and use it!)

StatMixedML commented 4 years ago

@jrzaurin

normally when I use LightGBM the defaults work relatively well for most cases. In fact superior to XGBoost and Catboost defaults.

Interesting. This is in fact contrary to what I have experienced, mostly for regression tasks though, with a lot of categorical covariates (I haven`t had much classification tasks). LightGBM appeared to be very sensitive to its hyper-parameters, and I had to do considerable hyper-parameter-tuning to arrive at a decent accuracy.

StatMixedML commented 4 years ago

@jrzaurin May I ask what set of parameters you usually set / optimize using LightGBM and what range you search over?

jrzaurin commented 4 years ago

@StatMixedML Sure! Up until a couple of weeks ago we used hyperopt and this param space:

        space = {
            "learning_rate": hp.uniform("learning_rate", 0.01, 0.3),
            "n_estimators": hp.quniform("n_estimators", 100, 1000, 50),
            "num_leaves": hp.quniform("num_leaves", 40, 400, 20),
            "min_child_samples": hp.quniform("min_child_samples", 20, 100, 20),
            "colsample_bytree": hp.uniform("colsample_bytree", 0.5, 1.0),
            "reg_alpha": hp.choice(
                "reg_alpha", [0.01, 0.05, 0.1, 0.2, 0.4, 1.0, 2.0, 4.0, 10.0]
            ),
            "reg_lambda": hp.choice(
                "reg_lambda", [0.01, 0.05, 0.1, 0.2, 0.4, 1.0, 2.0, 4.0, 10.0]
            ),
        }

Note that we do not use subsample or bagging_fraction. This is because our problem has a strong temporal component and we cannot sample rows at random. We also do not tune max_bin because, in all honesty, we do not know really how to "control" that param, so we leave it with its default value.

These days I am thinking in picking 5-10 datasets and run tones of lightgbm experiments with diff parameters see how results change when you change, for example, reg_lambda. This is because I have not found many resources online that give a hint of which values are sensible.

It is for that reason that we recently changed to Optuna, which has lightGBM fully integrated. When optimising GBMs there is a hierarchy in parameters, i.e. some are more important than others (all this you guys being THE experts I am sure you know). Optuna takes care of that and optimises following certain hierarchy so that you do not need to worry about the param space.

Let me know if this helps! :)

StatMixedML commented 4 years ago

@jrzaurin Nice, thanks for sharing! The sensitivity analysis is definitely something of interest for the wider community.

Not sure if you know this site https://sites.google.com/view/lauraepp/parameters.

I am mostly using Bayesian Optimization for arriving at sensible hyper-parameters. Basically what you do is to specify an initial set of hyper-parameters that provides a loss-surface for different combination of parameters. It then trains a surrogate model to learn the relationship between hyper-parameters and the loss. It then suggests new values for the hyper-parameters. However, I still face some problems for LightGBMLSS, as you can see below.

image

Ideally, we should see something like this

image

Obviously, it doesn`t learn the variance parameter well. Which is odd, as the partial dependence plot shows that it gets it right

image

Let me try and use your set of hyper-parameters.

javier-cazana commented 4 years ago

@StatMixedML Awesome. Let's see how it goes.

(this is still me replying from my working account :) )

And thanks, I did not know that site, is good to have a full list in one place!

Let me read your 2019 paper at some point see if I can be more useful πŸ˜€

StatMixedML commented 4 years ago

@jrzaurin Let me close this issue and re-open it at the LightGBMLSS repo here.