catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
https://catboost.ai
Apache License 2.0
8.03k stars 1.18k forks source link

GPU Out of Memory with Lossguide #1746

Open liberbey opened 3 years ago

liberbey commented 3 years ago

I using catboost on GPU but my model does not fit to my GPU's with Lossguide grow policy, but I was able to train a model with a very similir configuration by using Depthwise and Symmetric grow policies. In order to train the model with Lossguide, I need to decrease the data size 33%. The full size of my dataset is 13.6 GB.

The configuration fails with Lossguide and the error message:

{'iterations': 1300, 'l2_leaf_reg': 3.0, 'learning_rate': 0.09, 'loss_function': 'RMSE', 'max_bin': 63, 'max_ctr_complexity': 4.0, 
'max_depth': 9, 'max_leaves': 511, 'min_data_in_leaf': 30000, 'one_hot_max_size': 2, 'grow_policy': 'Lossguide', 'task_type': 'GPU', 
'bootstrap_type': 'No', 'pinned_memory_size': '30gb', 'nan_mode': 'Forbidden', 'has_time': True}
0
  0%|          | 0/100 [00:02<?, ?trial/s, best loss=?]Application terminated with error: ??+0 (0x7F2CC8D43630)
??+0 (0x7F2CC8D41985)
??+0 (0x7F2CC8D3FFF6)
??+0 (0x7F2CC78BC4D5)
??+0 (0x7F2CC901E857)
??+0 (0x7F2CC8FF4EEA)
??+0 (0x7F2CC8D40BC2)
??+0 (0x7F2CC8D410B8)
??+0 (0x7F2CC78BCCD0)
??+0 (0x7F2CC78BCB97)
??+0 (0x7F2DD2D51609)
clone+67 (0x7F2DD2C78293)

(NCudaLib::TOutOfMemoryError) catboost/cuda/cuda_lib/memory_pool/stack_like_memory_pool.h:303: Error: Out of memory. Requested 5546.536621 MB; Free 2131.370214 MB
uncaught exception:
    address -> 0x1160d9410
    what() -> "catboost/cuda/cuda_lib/memory_pool/stack_like_memory_pool.h:303: Error: Out of memory. Requested 5546.536621 MB; Free 2131.370214 MB"
    type -> NCudaLib::TOutOfMemoryError

But I was able to train a model with Symmetric Tree and Depthwise having a max_depth = 10 with these tune spaces respectively:

"cat_features": ["sector", "country"],
            "space": {
                'iterations': hp.quniform('iterations', 200, 1700, 100),
                'max_depth': hp.quniform('max_depth', 4, 10, 1),
                'learning_rate': hp.quniform('learning_rate', 0.03, 0.45, 0.03),
                'l2_leaf_reg': hp.quniform('l2_leaf_reg', -3, 8, 1), # 2**n
                'max_bin': hp.quniform('max_bin', 1, 10, 1),  # 2**n
                'one_hot_max_size': hp.choice('one_hot_max_size', [2, 255]),
                'max_ctr_complexity': hp.quniform('max_ctr_complexity', 1, 10, 1),
                "loss_function": hp.choice('loss_function', ['Cosine', 'L2', 'NewtonCosine', 'NewtonL2' ])
            }
"cat_features": ["sector", "country"],
            "space": {
                'iterations': hp.quniform('iterations', 200, 1700, 100),
                'max_depth': hp.quniform('max_depth', 4, 10, 1),
                'learning_rate': hp.quniform('learning_rate', 0.03, 0.45, 0.03),
                'l2_leaf_reg': hp.quniform('l2_leaf_reg', -3, 8, 1), # 2**n
                'max_bin': hp.quniform('max_bin', 1, 10, 1), # 2**n
                'min_data_in_leaf': hp.quniform('min_data_in_leaf', 3000, 75000, 5000),
                'one_hot_max_size': hp.choice('one_hot_max_size', [2, 255]),
                'max_ctr_complexity': hp.quniform('max_ctr_complexity', 1, 10, 1),
                "loss_function": hp.choice('loss_function', ['Cosine', 'L2', 'NewtonCosine', 'NewtonL2' ])
            }

I have 4 Tesla T4 GPU's. Here are the details:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA Tesla T4     On   | 00000000:00:1B.0 Off |                    0 |
| N/A   47C    P0    56W /  70W |  14602MiB / 15109MiB |     68%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA Tesla T4     On   | 00000000:00:1C.0 Off |                    0 |
| N/A   47C    P0    60W /  70W |  14544MiB / 15109MiB |     68%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA Tesla T4     On   | 00000000:00:1D.0 Off |                    0 |
| N/A   47C    P0    61W /  70W |  14488MiB / 15109MiB |     69%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA Tesla T4     On   | 00000000:00:1E.0 Off |                    0 |
| N/A   47C    P0    68W /  70W |  14430MiB / 15109MiB |     69%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2647      C   ...ntu/miniconda3/bin/python    14359MiB |
|    1   N/A  N/A      2647      C   ...ntu/miniconda3/bin/python    14301MiB |
|    2   N/A  N/A      2647      C   ...ntu/miniconda3/bin/python    14245MiB |
|    3   N/A  N/A      2647      C   ...ntu/miniconda3/bin/python    14187MiB |

Is this an expected behaviour? Is there anything I can do to use Lossguide grow policy with my full dataset?

LyzhinIvan commented 3 years ago

Hi, @liberbey! I see the difference in configurations that may be the reason. It is max_bin parameter. For SymmetricTree and Depthwise you choose it uniformly from 1 to 10, but for Lossguide you choose 63. This affects the size of quantized dataset. Try Lossguide with max_bin=10.

liberbey commented 3 years ago

Hi @LyzhinIvan , thanks for your answers.

This is my tune space for lossguide:

"cat_features": ["sector", "country"],
            "space": {
                'iterations': hp.quniform('iterations', 200, 1700, 100),
                'max_depth': hp.quniform('max_depth', 5, 16, 1),
                'max_leaves': hp.quniform('max_leaves', 5, 9, 1),
                'learning_rate': hp.quniform('learning_rate', 0.03, 0.45, 0.03),
                'l2_leaf_reg': hp.quniform('l2_leaf_reg', -3, 8, 1), # 2**n
                'max_bin': hp.quniform('max_bin', 1, 10, 1), # 2**n
                'min_data_in_leaf': hp.quniform('min_data_in_leaf', 5000, 75000, 5000),
                'one_hot_max_size': hp.choice('one_hot_max_size', [2, 255]),
                'max_ctr_complexity': hp.quniform('max_ctr_complexity', 1, 6, 1),
                "loss_function": hp.choice('loss_function', ['L2', 'NewtonL2' ])
            }

Actually the bin size is same for all of them since it is used as power of 2 minus 1. Only the configration I posted is the result of this calculation. Sorry that I did not mention it before.

LyzhinIvan commented 3 years ago

Which catboost version do you use?

liberbey commented 3 years ago

Which catboost version do you use?

I am using 0.26.

LyzhinIvan commented 3 years ago

Which catboost version do you use?

I am using 0.26.

Please, try to run your script with 0.25.1. We have made some updates with GPU in 0.26 and want to be sure that it is not the reason of your issue.