getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.04k stars 65 forks source link

CUDA out of memory despite trying to run on CPU #359

Closed JCBrouwer closed 6 months ago

JCBrouwer commented 7 months ago

Hello, I'm trying to use KeOps to calculate nearest neighbors on large datasets using the following code:

A = np.random.randn(10000, 30000)
B = np.random.randn(10000, 30000)
A, B = keops.LazyTensor(torch.FloatTensor(A[:, None, :])), keops.LazyTensor(torch.FloatTensor(B[None, :, :]))
pairwise_distances = (A - B).abs().sum(-1)
distances, row_ids = pairwise_distances.Kmin_argKmin(k, dim=1, enable_chunks=True)

Most of the time this works like a charm, but sometimes I'm getting the below error related to CUDA running out of memory.

[KeOps] error: cuMemAlloc(&p_data, sizeof(TYPE *) * nargs + sizeof(TYPE) * totsize) failed with error CUDA_ERROR_OUT_OF_MEMORY

Traceback (most recent call last):
  File "evaluation/distance.py", line 196, in get_nearest_neighbors
    distances, row_ids = symbolic_distances.Kmin_argKmin(k, dim=1, enable_chunks=True)
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/pykeops/common/lazy_tensor.py", line 2480, in Kmin_argKmin
    return self.reduction("KMin_ArgKMin", opt_arg=K, axis=axis, dim=dim, **kwargs)
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/pykeops/common/lazy_tensor.py", line 775, in reduction
    return res()
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/pykeops/common/lazy_tensor.py", line 957, in __call__
    return self.callfun(*args, *self.variables, **self.kwargs)
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/pykeops/torch/generic/generic_red.py", line 687, in __call__
    out = GenredAutograd_fun(params, *args)
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/pykeops/torch/generic/generic_red.py", line 383, in GenredAutograd_fun
    return GenredAutograd.apply(*inputs)[0]
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/pykeops/torch/generic/generic_red.py", line 291, in forward
    return GenredAutograd_base._forward(*inputs)
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/pykeops/torch/generic/generic_red.py", line 121, in _forward
    result = myconv.genred_pytorch(
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 236, in genred
    self.call_keops(nx, ny)
  File "/home/jcbgb/anaconda3/env/lib/python3.10/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 48, in call_keops
    self.launch_keops(
RuntimeError: [KeOps] Cuda error.

I'm pretty sure that the calculation is not actually being run on GPU as my script doesn't appear in nvidia-smi.

Could this be something to do with pinning GPU memory?

Is there a way to force KeOps to only use the CPU for certain operations?

I can set the entire script to ignore GPU like recommended here: https://github.com/getkeops/keops/issues/176

However, I would like other parts of my computation to use my GPUs, so I would rather only disable them for this section of code.

There does seem to be a kind of global SetBackend() class and a set_device function in torch/generic/generic_red.py, but I can't figure out how I would use them in the above code.

Any help would be appreciated!

JCBrouwer commented 7 months ago

Ahh I think I've found it!

pairwise_distances.Kmin_argKmin(k, dim=1, enable_chunks=True, backend="CPU")
joanglaunes commented 7 months ago

Hi @JCBrouwer , Yes indeed the option backend="CPU" is what you need to do. Also, the option enable_chunks=True is useless in CPU mode, it is only used for GPU computing.