Q: Roadmap for LightGBM interface in .NET

torronen commented 2 years ago

LightGBM 2.3.1 has some parameters and features which are missing from Microsoft.ML.LightGBM. One such is refitting / re-training:

Re-fitting is missing from all tree-based trainers in Microsoft.ML. It has been requested at least in #6010 albeit for FastTree, as well something useful for some of my use cases. Another useful would be GPU-support which is mentioned multiple issues, e.g. https://github.com/dotnet/machinelearning/pull/452

There was also PR about LightGBM upgrade to 3.x versions but it was decided to not upgrade it at this time. Upgrade requirements are listed in issue https://github.com/dotnet/machinelearning/issues/5447 .NET might be missing out on the improvements, as well as throws "Bad allocation" errors in some combination of hyperparameters.

In my experiments, FastTree often outperforms LightGBM. Based on online discussion sentiment about LightGBM this should not be the case. LightGBM is considered a high-performing algorithm and has continued development which I think LightGBM integration should be important for performance of Microsoft.ML.

Finally, there may be some misconfiguration which prevents Microsoft.ML.LightGBM getting similar results with same hyperparameters as through Python interface (links in https://github.com/dotnet/machinelearning/pull/6064 , sample code https://github.com/torronen/lightgbm-comparison )

Is the intent to have a complete LightGBM interface in .NET, or is it better to use Python for advanced uses cases (and e.g. export to ONNX from Python)? Any roadmaps / estimated priority for upgrade of LightGBM?

torronen commented 2 years ago

Advanced use cases could compile executable for GPU support per LightGBM doc instructions. In this case, the option to load a model from Lightgbm model.txt would be more useful.It would also allow to use LightGBM CLI for distributed tasks.

michaelgsharp commented 2 years ago

We don't currently have a LightGBM specific roadmap for this (other than updating to the latest version), but we are planning on sitting down and discussing it to figure that out.

We do think that adding the api to load a model from the model.txt would be good. In fact, we already do internally, so it would mostly be just exposing it in a friendly manner.

torronen commented 2 years ago

Great, I think it is a good idea. It may be easier and faster to implement, and also it will allow to use all features of LightGBM CLI, like distributed learning. Maybe there also is some reason why users are asked to compile the GPU version themselves... Production datasets tend to be big, so distributed learning is probably something I should move towards as well.

michaelgsharp commented 2 years ago

My guess the reason for having to compile it yourself is two fold:

1 - They then have to package it all. Usually nuget packages with GPU support are pretty huge and have to be split into multiple packages.. If you look at the GPU packages for TorchSharp, its split into like 17 different chunks due to file size restrictions in nuget.. Its honestly a hassle to deal with (here's to hoping that can be fixed somehow in the future...)

2 - They will either have to build for lots of different cuda versions or limit it to only 1 version. Most things tend to limit it to 1 version. By having the user compile it themselves they can often times use whatever version the user has.

It would be awesome if these could be solved so that things are easier in the future. No idea if/when that would happen though.

torronen commented 2 years ago

Do you know if the Model.zip file includes the model already in LightGBM .txt format? If it would, then we could do refit with LightGBM CLI, although we would need to still run the featurizers while saving data for CLI to refit. The zip file seems to include the hyperparameters in key.model which could be used to retrain the model, but the result could be a different kind of tree than the original. Thus, refit would be safer when tailoring a LightGBM model with new user input.

michaelgsharp commented 2 years ago

It doesn't include it. After the LIghtGBM model is trained we load in the model and then convert it from their LightGBM format into our ML.NET format.

We have discussed making a way to save this if desired, but nothing has been decided yet.

If you can build from source though I could show you where you would need to go to save the model. I know thats not a great fix for production, but for testing/locally it would work. Hoping to have a conclusion soon about the right way to handle these cases.

torronen commented 2 years ago

If you can build from source though I could show you where you would need to go to save the model. That would be great if you could do that!

michaelgsharp commented 2 years ago

Saving:

So here is where we actually have the LightGBM model as a string internally. The easiest way would just be to take that modelString and write it out. Behind the scenes though we are actually calling the native function LGBM_BoosterSaveModelToString you can find (here){https://github.com/dotnet/machinelearning/blob/510f0112d4fbb4d3ee233b9ca95c83fae1f9da91/src/Microsoft.ML.LightGbm/WrappedLightGbmInterface.cs#L189}. You could use the native calls directly if you want, but I think just inserting a call to save it there directly would be easiest.

Loading:

Same spot and same idea, just instead of saving out the model, load in your model and overwrite the modelString variable. We should then parse it in the same way.

Both of these are kinda hacky, but we did talk today about adding official support for doing this. No timelines as of yet, but it is on our roadmap to do now.

torronen commented 2 years ago

@michaelgsharp Thank you so much, looks great to me!

dotnet / machinelearning

Q: Roadmap for LightGBM interface in .NET #6065