getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

CUDA_ERROR_INVALID_SOURCE for a K-NN search #285

Open idigitopia opened 1 year ago

idigitopia commented 1 year ago

To reproduce:

import torch 
from dacmdp.core.utils_knn import THelper
THelper.batch_calc_knn_pykeops(torch.randn(10,5).cuda(),
                               torch.randn(1000,5).cuda(), k=1)
idigitopia commented 1 year ago
image
jeanfeydy commented 1 year ago

Hi @idigitopia,

Thanks for your interest in the library.

Could you please give us some detail on your issue? Looking at your dacmdp repository, my understanding is that your THelper.batch_calc_knn_pykeops(...) function performs the following computation:

import torch
from pykeops.torch import LazyTensor

def batch_calc_knn_pykeops(query: torch.Tensor, data: torch.Tensor, k:int):
    X_i = LazyTensor(query[:, None, :])  # (10000, 1, 784) test set
    X_j = LazyTensor(data[None, :, :])  # (1, 60000, 784) train set

    D_ij = ((X_i - X_j) ** 2).sum(-1)  # (10000, 60000) symbolic matrix of squared L2 distances
    ind_knn = D_ij.argKmin(k, dim=1)  # Samples <-> Dataset, (N_test, K)
    dist_knn = torch.norm(query.unsqueeze(1) - data[ind_knn], p = 2, dim = -1)
    return ind_knn, dist_knn

batch_calc_knn_pykeops(torch.randn(10,5).cuda(), torch.randn(1000,5).cuda(), k=1)

After a quick test on Google Colab, I cannot reproduce your problem. The error may be related to an unexpected interaction with other parts of your code, or to a multi-GPU setup? Could you maybe provide us with a minimal "non-working" example?

Best regards, Jean