Kl divergence on GPU is so slow

nicolasch96 commented 3 years ago

Hello again :)

Thank you very much for the code it is really useful! I am trying to recreate the results of the paper with monte carlo simulations. I was testing the computational time with the GPU code. However, I noticed that when cuda=True, the kldivs is very slow while cuda=False it is much faster. To give an example, for a matrix of size 80x80, the KL divs time is around 5.16 seconds for cuda=False and 19.16 seconds for cuda=True. I am using an RTX2060 super, and I think this is causing a bottleneck for the GPU code which makes it as fast as the CPU code (I mean for cuda= True or False) (batch time for N=80 is around 20 seconds)

Thank you again !

Nicolas

nicolasch96 commented 3 years ago

I found a solution by using:

def kl_divergence_approx(mean_1, std_1, mean_2, std_2):
    normal0 = dist.Normal(mean_1, std_1)
    normal = dist.Normal(mean_2, std_2)
    total = (1 + 2*dist.kl.kl_divergence(normal0, normal)).sum()
    return total

mikhailiuk commented 3 years ago

Hi @nicolasch96,

Thanks a lot for investigating and contributing!

The code will be updated!

nicolasch96 commented 3 years ago

Hello thanks for taking into consideration my comment. Please make sure to use:

kl_divergence_approx(normal0.mean, normal0.scale, normal.mean[cc], normal.scale[cc])

instead of:

kl_divergence_approx(normal0.mean, normal0.variance, normal.mean[cc], normal.variance[cc])

gfxdisp / asap

Kl divergence on GPU is so slow #2