dmlc / XGBoost.jl

XGBoost Julia Package
Other
288 stars 110 forks source link

full GPU support #142

Closed ExpandingMan closed 1 year ago

ExpandingMan commented 1 year ago

Huge thank you to @tylerjthomas9 who handled the XGBoost_jll update which was by far the hardest part of this.

I think this is it, full GPU (nvidia only) support, but whatever libxgboost is doing here is very opaque to me. All I really know is that the DMatrix is being constructed from a CuArray using the dedicated method in libxgboost. What happens after that I have no idea. It would make me feel a lot better if @trivialfis or someone else could confirm that this is all that has to be done.

There's a huge downside here: the matrix has to be transposed. That means that in any case where you are not passing a transpose or adjoint of the array it has to be copied. As far as I know there is no way around this. That's a better reason to add XGDMatrixCreateFromCudaColumnar which I haven't gotten to yet only because it raises a lot of questions about the interface (it has to infer the array types when it gets a table which it's not at all set up to do).

ExpandingMan commented 1 year ago

Ok, I have confirmed this is definitely working by monitoring nvtop. Excitingly, nvtop wasn't really necessary, the speedup can be so big that it is blindingly obvious.

I have also added a method for creating a DMatrix from a table of CuVectors (it is automatically detected).

Though I had some misgivings, I've made it so that tree_method defaults to gpu_hist if all of the data provided to the booster was a CuArray. I usually frown upon this kind of hacking of defaults but it's hard to imagine why you'd initialize with CuArrays unless you intend to use them, and of course you can still override the value. I've documented this in multiple places. Still, let me know if there are objections, I have some misgivings about it.

ExpandingMan commented 1 year ago

@ablaom @rikhuijzer I'm soliciting comments from you guys before I merge. The only thing this does which I think might be at all controversial is automatically set gpu_hist if CuArray is used.

ExpandingMan commented 1 year ago

Thanks for taking the time to review @rikhuijzer !

I will merge this either this evening or tomorrow if there are no objections.

ExpandingMan commented 1 year ago

I also want to say here for posterity that, as Rik pointed out to me elsewhere, if we ever decided to change this so that using the GPU with a CuArray is no longer the default it would definitely be breaking, meaning that to change this we'd need XGBoost.jl 3.0.

trivialfis commented 1 year ago

Apologies for the slow response, just recovering from covid ...

All I really know is that the DMatrix is being constructed from a CuArray using the dedicated method in libxgboo

In Python, DMatrix takes a dense matrix cupy.ndarray just like numpy.ndarray using the XGDMatrixCreateFromCudaArrayInterface C function. Both row major and column major are supported. Just use the __cuda_array_interface__ to specify the strides of the array.

As for the XGDMatrixCreateFromCudaColumnar, that's just a list of arrays representing a dataframe, with each array being a column. The term columnar is from apache arrow IPC format. Under the specification, each column can contain a certain type of data and a null mask that represents missing entries, which is pretty much the same as a masked numpy array.

andreasnoack commented 1 year ago

We are using XGBoost on a platform without a GPU so we are now pulling in CUDA for no use. Would it be technically feasible to make the CUDA support an extension?

ExpandingMan commented 1 year ago

There is currently this PR. We haven't heard from the author in a few weeks. Once the full release of 1.9 hits this'll become a much higher priority.

andreasnoack commented 1 year ago

Thanks. I should have been able to find that. I did check the issue tracker, though. I'll ask David if can take it to completion.