Felix-Petersen / difftopk

Differentiable Top-k Classification Learning
MIT License
71 stars 3 forks source link

DiffTopK efficiency on gpu #7

Closed doric35 closed 10 months ago

doric35 commented 1 year ago

Hi, I have noticed that gpu computation for the diff top k example are significantly slower on gpu than on cpu. Do you know the reason for that? have you noticed a size of tensor where the computing becomes faster on GPU? Should we send data to cpu before to run the algorithm and back on gpu after if we use it with a quite big model? thanks.

Felix-Petersen commented 10 months ago

Hi @doric35 , yes, depending on the exact hardware, your observation makes a lot of sense. There is a particular overhead involved in each GPU operation that is called, so for small tensors the CPU is usually faster. Running difftopk on the cpu can therefore make sense depending on the particular case. I would recommend just using what is faster empirically on your hardware. In my experiments, I ran some experiments on CPU and others on GPU.