catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
https://catboost.ai
Apache License 2.0
8.02k stars 1.18k forks source link

Issue with GPU Training Time for Multi-Class Classification with Embeddings #2743

Open KozuyutovAndrey opened 4 days ago

KozuyutovAndrey commented 4 days ago

Good day! I am training a multi-class classification model using embeddings on an NVIDIA GeForce RTX 3090. However, when selecting task_type='GPU', the training time does not significantly differ from that on the CPU, taking approximately 60 minutes for 500,000 examples. Additionally, the loss tracking begins only after 50 minutes, possibly due to the processing of embeddings into integer features.

Given that the processing of embeddings into features takes a significant portion of the time, I kindly request that this processing be accelerated on the GPU if possible.

Thank you for your attention to this matter.

Best regards, Andrey

catboost version: 1.2.5 Operating System: Ubuntu, Windows 10

andrey-khropov commented 4 days ago

Can you provide more details? Code, embedding dimensions, how many features of each type do you have? Also, how many classes do you have?

KozuyutovAndrey commented 4 days ago

Embedding Dimensions: Each embedding has dimensions ranging from 32 to 512. For simplicity, you can consider the average dimension to be 255.

Number of Classes: There are 3 classes.

Regarding the number of features of each type. Embedding features, I have 60 such features of the type float64.

andrey-khropov commented 4 days ago

Embedding Dimensions: Each embedding has dimensions ranging from 32 to 512. For simplicity, you can consider the average dimension to be 255.

Number of Classes: There are 3 classes.

Regarding the number of features of each type. Embedding features, I have 60 such features of the type float64.

And what training parameters do you use? Do you specify input data through python structures or load it from files?

KozuyutovAndrey commented 4 days ago

image

Input data from dataframe

EpicUsaMan commented 3 days ago

The same on my side

786 dimensions in embedding 188k rows 810 columns per row (786 is numeric embeddings)

dropping out embeddings as additional column fixing the problem fully

Affects both CPU/GPU

when embeddings is in place it loads 1 core of CPU fully for very long time (more than half hour), while learning without this column just about 2 minutes

Also, CV partially fixing this issue, more folds -- less chance to stuck with it

So looks like a problem with some kind of vector operation on embeddings

Affects all versions which supports embeddings

P.S. Task is my case is regression (RMSE)