The master implementation creates a PyTorch tensor of zeros to store the full covariance matrix, even if only the diagonal elements are needed.
If diag=True then the (potentially very large) tensor K is created but never used. This can result in a GPU running out of memory in an avoidable situation.
I have moved the creation of the tensor to store the full covariance matrix into the if diag=False branch of the code.
The master implementation creates a PyTorch tensor of zeros to store the full covariance matrix, even if only the diagonal elements are needed.
If diag=True then the (potentially very large) tensor K is created but never used. This can result in a GPU running out of memory in an avoidable situation.
I have moved the creation of the tensor to store the full covariance matrix into the if diag=False branch of the code.