getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

An error occurred when I test my installation and about Memory Usage #300

Closed pearl-rabbit closed 1 year ago

pearl-rabbit commented 1 year ago

I installed the recently updated keops using the following method: git clone --recursive https://github.com/getkeops/keops.git /path/to/libkeops pip install -e /path/to/libkeops/keopscore -e /path/to/libkeops/pykeops

But when I was testing, there was an error:

`>>> pykeops.test_numpy_bindings()

[KeOps] error: cuMemcpyDtoH(out, (CUdeviceptr) out_d, sizeof(TYPE) * sizeout) failed with error CUDA_ERROR_INVALID_VALUE

Traceback (most recent call last): File "", line 1, in File "/distM2_1T/tools/libkeops/pykeops/pykeops/numpy/test_install.py", line 20, in test_numpy_bindings if np.allclose(my_conv(x, y).flatten(), expected_res): File "/distM2_1T/tools/libkeops/pykeops/pykeops/numpy/generic/generic_red.py", line 347, in call out = self.myconv.genred_numpy(-1, ranges, nx, ny, nbatchdims, out, *args) File "/distM2_1T/tools/libkeops/pykeops/pykeops/common/keops_io/LoadKeOps.py", line 230, in genred self.call_keops(nx, ny) File "/distM2_1T/tools/libkeops/pykeops/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 65, in call_keops self.argshapes_new, RuntimeError: [KeOps] Cuda error. `

My code can run, but the memory is still increasing during the run. This is the output of "memory_profile":

Line #    Mem usage    Increment  Occurrences   Line Contents

24  10472.3 MiB  10472.3 MiB           1   @profile
25                                         def knn_atoms(x, y, x_batch, y_batch, k):
26  10472.3 MiB      0.0 MiB           1       N, D = x.shape
27  10472.4 MiB      0.1 MiB           1       x_i = LazyTensor(x[:, None, :])
28  10472.4 MiB      0.0 MiB           1       y_j = LazyTensor(y[None, :, :])
29                                             # x_i = x[:, None, :]
30                                             # y_j = y[None, :, :]
31  10472.5 MiB      0.1 MiB           1       pairwise_distance_ij = ((x_i - y_j) ** 2).sum(-1)
32  10473.9 MiB      1.4 MiB           1       pairwise_distance_ij.ranges = diagonal_ranges(x_batch, y_batch)
33                                         
34                                             # N.B.: KeOps doesn't yet support backprop through Kmin reductions...
35                                             # dists, idx = pairwise_distance_ij.Kmin_argKmin(K=k,axis=1)
36                                             # So we have to re-compute the values ourselves:
37  10474.3 MiB      0.4 MiB           1       idx = pairwise_distance_ij.argKmin(K=k, axis=1)  # (N, K)
38  10474.3 MiB      0.0 MiB           1       x_ik = y[idx.view(-1)].view(N, k, D)
39  10474.3 MiB      0.0 MiB           1       dists = ((x[:, None, :] - x_ik) ** 2).sum(-1)
40                                         
41  10474.3 MiB      0.0 MiB           1       return idx, dists
bcharlier commented 1 year ago

Hi @pearl-rabbit

commit https://github.com/getkeops/keops/commit/3c1ebb0478a94cc6772fa582a8a604a4b7a07acd should have fixed the leaks described issue #284 . If the memory increasing you are experiencing is still a leak, please provide a minimum example to reproduce it and feel free to reopen the issue.

pearl-rabbit commented 1 year ago

I am currently unsure where the leak issue occurred, but when I looked at the cache file "keops version/Linux_admin 5.4.0-144 generic_p3.7.16", I found that the file "LoadKeOps_nvrtc_class_cache. pkl" takes up more space than it was used during initialization.