Open KozuyutovAndrey opened 4 days ago
Can you provide more details? Code, embedding dimensions, how many features of each type do you have? Also, how many classes do you have?
Embedding Dimensions: Each embedding has dimensions ranging from 32 to 512. For simplicity, you can consider the average dimension to be 255.
Number of Classes: There are 3 classes.
Regarding the number of features of each type. Embedding features, I have 60 such features of the type float64.
Embedding Dimensions: Each embedding has dimensions ranging from 32 to 512. For simplicity, you can consider the average dimension to be 255.
Number of Classes: There are 3 classes.
Regarding the number of features of each type. Embedding features, I have 60 such features of the type float64.
And what training parameters do you use? Do you specify input data through python structures or load it from files?
Input data from dataframe
The same on my side
786 dimensions in embedding 188k rows 810 columns per row (786 is numeric embeddings)
dropping out embeddings as additional column fixing the problem fully
Affects both CPU/GPU
when embeddings is in place it loads 1 core of CPU fully for very long time (more than half hour), while learning without this column just about 2 minutes
Also, CV partially fixing this issue, more folds -- less chance to stuck with it
So looks like a problem with some kind of vector operation on embeddings
Affects all versions which supports embeddings
P.S. Task is my case is regression (RMSE)
Good day! I am training a multi-class classification model using embeddings on an NVIDIA GeForce RTX 3090. However, when selecting task_type='GPU', the training time does not significantly differ from that on the CPU, taking approximately 60 minutes for 500,000 examples. Additionally, the loss tracking begins only after 50 minutes, possibly due to the processing of embeddings into integer features.
Given that the processing of embeddings into features takes a significant portion of the time, I kindly request that this processing be accelerated on the GPU if possible.
Thank you for your attention to this matter.
Best regards, Andrey
catboost version: 1.2.5 Operating System: Ubuntu, Windows 10