borchero / pycave

Traditional Machine Learning Models for Large-Scale Datasets in PyTorch.
https://pycave.borchero.com
MIT License
126 stars 13 forks source link

Long initialization time when using GPU #26

Open fkendlessly opened 2 years ago

fkendlessly commented 2 years ago

Hi @borchero, I am using the GPU for clustering or GMM and the initialization operation takes a long time compared to the CPU. After executing the following code segment on the RTX3090, the GPU initialization time is about 4.1 seconds. However, the CPU only takes about 0.17 seconds. Any suggestions to solve this problem?

from pycave.bayes import KMeans
import torch
import time 

input_data = torch.randn(90000, 3)
start_time = time.time()
estimator = KMeans(3, trainer_params = dict(gpus = 1, max_epochs = 10))
estimator.fit(input_data)
end_time = time.time()
print('cost_time: %f seconds', %(end_time - start_time))
borchero commented 2 years ago

There is likely nothing I can do to make this any faster. In general, a process needs some time to initialize the GPU.

I think you can try running torch.cuda.init() and you will probably see that this operation takes ~4 seconds.

fkendlessly commented 2 years ago

When I run torch.cuda.init(), it only takes 1e-5 seconds. Actually, I found out that the above code initialization is related to the estimator.py in the .../pycave/clustering/kmeans directory. I tested the 129th line of estimator.py, self.trainer(max_epochs=num_epochs).fit(module, loader), which took about 4 seconds.

borchero commented 2 years ago

Can you also benchmark torch.empty(1).cuda()? I thought that torch.cuda.init() is the culprit but I'm quite certain that the delay is the first interaction with the GPU (I just don't know for sure when it's happening).

fkendlessly commented 2 years ago

torch.empty(1).cuda() takes about 0.4 milliseconds.

borchero commented 2 years ago

Mh ok, interesting. I don't think it has anything to do with PyCave but I will check again. Unfortunately, I don't have direct access to a GPU at the moment.