RAMitchell / GBM-Benchmarks

MIT License
23 stars 9 forks source link

'n_gpus' analogue is not passed to CatBoost #3

Open Noxoomo opened 6 years ago

Noxoomo commented 6 years ago

You should set number of devices to equal number for CatBoost/XGBoost

so the benchmark in not fair without CUDA_VISIBLE_DEVICES=id, because CatBoost uses all devices by default and this is not a good idea to you small benchmark datasets with 8V100 servers

RAMitchell commented 5 years ago

@Noxoomo @annaveronika if you want to further comment please go ahead, I will be redoing this benchmark next week for a paper submission.

KruchDmitriy commented 5 years ago

Hello, my name is Dmitriy Kruchinin and i'm writing to you as a member of CatBoost team. We would like to ask you to pay attention to the following points during re-evaluation of your benchmarks.

  1. Dataset collection. You may find the example of our gbdt benchmark here. It is based on your repository. We want to point out that we extend the collection of datasets:

    • There are two small datasets Abalone regression, and Letters multiclass, by default on datasets with less than 50K samples CatBoost use ordered boosting, which is more accurate, but slower
    • One hot encoded Airline dataset as an example of highly sparse dataset, from szilard benchmarks
    • Synthetic regression dataset with a huge number of features (5K)
    • Microsoft learning to rank (WEB10K) dataset presenting ranking task type
    • Epsilon as a dense dataset with a huge number of documents (500K)

    Obviously, the set can be further expanded, but in our opinion this version fully covers the cases of various load on the GBDT library.

    If you will use these datasets in your benchmark, please tell us if your results will differ from our.

  2. Library updates. CatBoost supports MultiClass objective on GPU since 0.10.0 version. We use 0.80 version of xgboost, 2.2.1 -- LightGBM and 0.10.3 -- CatBoost in the above mentioned benchmark tool.
  3. In this repository the speed numbers are published together with quality measurements. This paper shows why results of such benchmark can be misleading, because they do not reflect the overall picture of library performance in terms of quality. Making fair comparison of both speed and quality is challenging task that requires a lot of computational resources.
RAMitchell commented 5 years ago

@KruchDmitriy Thanks for your response, I will look at adding some extra datasets.