dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.89k forks source link

LightGBM GPU switch (res["device_type"] == "gpu") #6666

Open torronen opened 1 year ago

torronen commented 1 year ago

System Information (please complete the following information): daily build

Feature request LightGBM support GPU for the 2nd part of the algorithm. However, the binary for GPU support needs to be built manually ( https://lightgbm.readthedocs.io/en/latest/GPU-Tutorial.html ) and AFAIK it is not distributed officially or unofficial ATM.

GPU support can be easily enabled: 1) compile LightGBM GPU (take note to use same version as Microsoft.ML) 2) update res["device_type"]="gpu" array in *.LightGBM 3) update library dll name in source to use your compiled version, mine is lightgbm_gpu, same file as above

My experience is on Windows platform with 1080Ti and 3090. GPU becomes helpful with huge datasets (10+ GB). In small datasets I did not see as much a big difference due to the 1st part still being CPU bound.

I've seen a few requests here and in modelbuilders repo.

Discussion and suggestion If the device_type switch would be exposed then users could use unmodified Microsoft.ML library with custom GPU binary.

However, because this requires a custom binary (unless it would be included in nuget.org) so putting it together with all other LightGBM parameters could cause confusion with users who expect it to work just by setting it. 2nd, the AutoML pipelines need also some changes.

Do you think it would be reasonable to expose this parameter, and what would be the best way to do it?

After a short while of thinking the best I could come up with is putting it as a static method. For example LightGBMTrainer.SetGPUBinary("lightgbm_gpu.dll"); If user sets this binary, then device_type would also be changed. The change would be global. This interface would also let users know they must provide a custom binary for it to work as they can not call the method without.

Opinions on this?

If there is a good plan I can work on this because I would really prefer to use the daily feed instead of a custom build Microsoft.ML. It is the remaining feature I would need to upgrade to the daily build from our modified 1.x version. However, I am not sure if this is something you would like to include in the library considering the absence of GPU binaries, and if there is a good way to implement it. Any advice appreciated.

luisquintanilla commented 1 year ago

Hi @torronen,

Thanks for this suggestion. In the MLContext, we have GpuDeviceId and FallbackToCpu. Would using those and leveraging them in LightGBM work for your scenario?

ghost commented 1 year ago

This issue has been marked needs-author-action and may be missing some important information.

torronen commented 1 year ago

Yes, it could work. Maybe something like this?

in user code: ctx.GpuDeviceId = 1; ctx.FallbackToCpu = true

in this file https://github.com/dotnet/machinelearning/blob/3d705bf05d3be5e5232089c3524213ebbae2911f/src/Microsoft.ML.LightGbm/LightGbmBinaryTrainer.cs:

if(ctx.GpuDeviceId > 0)
{
   res["device_type"] == "gpu;
}

Need to check where to check the correct dll is used, which exception type and exact message. Or maybe, there could be dll's for CPU and Cude GPU separately and depending on GpuDeviceId the correct dll get used.


...catch(Exception ex)
{
   if(ex.Message.Contains("gpu invalid")
   {
        if(ctx.FallbackToCpu = true)
         {
            res["device_type"] = "cpu";
            TryTrainAgain();
         }
        else
        {
           throw new Exception("LightGBM binary not found. Please set MLContext.FallbackToCpu = true, or compile GPU binary with instructions from https://lightgbm.readthedocs.io/");
        }
   }
}
torronen commented 1 year ago

Here is the hard-coded version to make LightGBM use GPU. Binary is for Cuda (nvidia) on Windows. https://github.com/torronen/machinelearning/commit/f898d839804f5f45abe2891f880cbd8f4b340480

This needs one more change in the default search_space. GPU does not support big bin_size, I suppose 255 but need to double-check.

Ploug commented 1 year ago

Is there any update on this? I think its very valuable to have LightGBM GPU support now that ML.net is using the newer version of LightGBM and it apparently seems like a straight forward fix.

superichmann commented 8 months ago

image https://lightgbm.readthedocs.io/en/latest/GPU-Performance.html ML.Net is Microsoft LightGbm is Microsoft mmm maybe there is a way somehow to support GPU out of the box of ml.net? also for fastforest will be nice ;)