dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.94k stars 1.86k forks source link

Please support XGBoost i.e. gradient boosting machines #1539

Open AceHack opened 5 years ago

AceHack commented 5 years ago

This is the 2nd most popular model on Kaggle.

Thanks.

artidoro commented 5 years ago

@TomFinley what are your thoughts on this? We already have other tree based learning algorithms in ML.NET, but it is true that XGBoost is a very popular one.

TomFinley commented 5 years ago

Hi @artidoro (and @AceHack ). We do wrap it internally, we just haven't ported over the wrapper. I'm not sure why; most probably simply lack of time and opportunity, since of course we've had many other things to do. I agree it would be a very nice thing to do.

One potential barrier is that the distribution story externally will be somewhat difficult. Our current policy, which I think is good, is that to include it in the "official" ML.NET the learner must work on Windows and Linux and Mac. xgboost runs on all those platforms, but I do not see that there is a nuget containing them we can easily consume, which means that we're either in the business of building it ourselves (as we did with LibMF and its subgit), or we somehow arrange for a nuget to be published. LightGBM is the closest analogy to the latter approach I can imagine, but the situation is somewhat different.

But on the whole not impossible. Just a number of problems that need to be solved, though I strongly agree we should.

The fact that there are multiple learners using the same basic technology (in this case trees) is not inherently problematic, and certainly not a reason to ship something as popular as xgboost.

justinormont commented 5 years ago

The demand is there to port it to ML.NET, and for that reason it's useful to port.

That said, I don't see XGBoost winning very often vs. our existing tree based models. Perhaps a newer version will show gains, or perhaps a round of optimizing our default hyperparameters is in order.

rogancarr commented 5 years ago

As an aside, XGBoost does have a lot of features, some of which we don't support in FastTree. It would also be great to understand which of those features we should support in FastTree.

JohnGalt1717 commented 3 months ago

I think with XGBoost 2.0, it's ahead in a lot of metrics and would be highly useful.