asu-cactus / netsdb

A system that seamlessly integrates Big Data processing and machine learning model serving in distributed relational database
Apache License 2.0
15 stars 5 forks source link

Support for LightGBM in our Decision Forest Benchmark framework #40

Closed jiazou-bigdata closed 2 years ago

jiazou-bigdata commented 2 years ago
  1. Figure out how to train LightGBM in ScikitLearn and enhance the training script;
  2. Figure out how to convert LightGBM model to ONNX, HummingBird-Pytorch, HummingBird-TorchScript, HummingBird-TVM, lleaves, TreeLite and enhance our converting scripts;
  3. Figure out how to test LightGBM for inference on ScikitLearn, ONNX, HummingBird-Pytorch, HummingBird-TorchScript, HummingBird-TVM, lleaves, TreeLite, and enhance our testing script.

You can use Higgs dataset for this purpose: https://github.com/NVIDIA/gbm-bench

I believe the following two benchmark suites will be helpful for you to complete the task:

HummingBird including LightGBM for ScikitLearn, HummingBird, ONNX: https://github.com/microsoft/hummingbird/tree/main/benchmarks/trees

(LightGBM and XGBoost) https://github.com/NVIDIA/gbm-bench https://github.com/Azure/fast_retraining/

lleaves benchmark https://github.com/siboehm/lleaves