XGBoost rewrite on Julia using Metal.jl

Roh-codeur commented 1 year ago

Hi

thanks for the work on this library. Since you all are experienced with XGBoost, I was wondering if you have any thoughts on rewriting XGBoost in Julia, potentially using Metal.jl? I am sure Apple M1 will bring considerable boost in performance. Thoughts please

ta!

tylerjthomas9 commented 1 year ago

EvoTrees.jl is a pure julia version of gradient boosting. EvoTrees.jl has phenomenal cpu performance. This library is just a wrapper of the xgboost c library, so it wouldn't be possible to make those changes here. Maybe xgboost will support more gpu backends at some point.

Roh-codeur commented 1 year ago

EvoTrees.jl is a pure julia version of gradient boosting. EvoTrees.jl has phenomenal cpu performance. This library is just a wrapper of the xgboost c library, so it wouldn't be possible to make those changes here. Maybe xgboost will support more gpu backends at some point.

thanks, @tylerjthomas9! I will take a look at EvoTrees.jl. I noticed their benchmarks and it seems that even EvoTrees would benefit from GPU. :)

I understand your point about the scope of this project; I have very little knowledge about XGBoost and programming in general, so was hoping you all would be able to advise on the complexity to port XGBoost so it would support Apple M1 Silicon in Julia. The thing is I have to run multiple models and need the current XGBoost implementation takes a very long time to run, so, am hoping Apple M1 GPU would be able to help.

ta!

ExpandingMan commented 1 year ago

Originally when I started working on this package it was mostly because it seemed to have a very high reward to effort ratio, which I think has mostly been borne out. However, at some point I'm going to start looking to replace it anywhere I might use it with EvoTrees.jl, and see how close it is to parity. It does seem like it has had a lot of recent work done.

Roh-codeur commented 1 year ago

sure, I understand! thanks again for all your work on this package, I know a lot of users, like me, sincerely appreciate all your help on this package.

as I write this, I am looking at EvoTrees as well. the benchmarks look quite impressive.

thanks!

tylerjthomas9 commented 1 year ago

It might be worth writing methods to convert EvoTrees.jl models to and from XGBoost. There's an issue for it on EvoTrees.jl

https://github.com/Evovest/EvoTrees.jl/issues/179

bobaronoff commented 1 year ago

I too have M1 Apple Silicon and share the original poster's pain. However, such is the state of affairs. Apple silicon is still only at tier 2 support and Metal.jl describes itself as a work in progression i.e., not ready for production work. Many great people doing great work but this will take time. That said, there are some maneuvers to help reduce computation times - it will not come close to GPU but can be significant. Am kind of curious how many rows/columns are involved.

I presume the datasets are quite large. Here are some options to consider. 1) 'turn off' the watchlist. I have found that computation time is reduced by 40% without these. Of course you'll need to do these an alternate way but Julia and MLJ are quick and can be done less frequently i.e. every 20 rounds. You probably already doing this. 2) if there are excessive rows i.e. 10^5-10^6, can reduce the subsample fraction and also consider the hist tree method as opposed to the exact tree method. I have not used hist method but it is intended to be computationally efficient. 3) if there are excessive columns i.e. 10^2-10^3, take advantage of the column sub-sampling parameters. There are 3 which are additive and work by tree, level, and node. This also reduces computation time. 4) multi-threading for a single model is not extremely helpful as rounds are iterative. However, threading can help with concurrent CV fold processing and grid search. I've used this with R but not Julia. I believe that MLJ has facilities for this. 5) don't shoot the messenger here, but if the need is great, one can always borrow a machine with Intel CPU and Nvidia GPU.

I have found xgboost to be quite fast compared to the 'gbm' package in R and Dr Friedman's MART, so I am a glass half full guy. My datasets are relatively small (i.e. 10,000 rows/ 25 column); at this size a 10 fold CV using exact trees and watchlist 'on' and 1,000 rounds takes about 25 seconds. For me this is acceptable. Other's needs will vary.

tylerjthomas9 commented 1 year ago

2. the hist tree method as opposed to the exact tree method. I have not used hist method but it is intended to be computationally efficient.

Using tree_method="hist" should make a large speed difference. I always use hist when training on CPUs.

jeremiedb commented 1 year ago

I'd definitely appreciate user inputs on EvoTrees :) AFAICT, it now fares quite well on CPU, but I'm pretty sure the GPU story could be improved. I haven't but any effort either with regard to distributed setup, nor multi-GPU. All spaces where I'm less knowledgeable.

Potential "low hanging fruits" I was considering shorter term was random forest mode as well as oblivious trees (tree structure used by CatBoost).

That being said, it's really nice to see the recent efforts to bring more robust wrappers around go-to librairies like XGBoost and CatBoost. These are definitely important to raise Julia's credibility in general ML space.

dmlc / XGBoost.jl

XGBoost rewrite on Julia using Metal.jl #167