Memory leak using fit_transform

DradeAW commented 3 years ago

Hi,

I've been trying to use tsnecuda on my dataset, but I keep getting memory errors even though I'm using relatively small dataset.

My array is a 100000x375 of int16 (= 72MB), and I'm running the software on a RTX 2080 8GB. When running TSNE(n_components=2).fit_transform(data), the GPU memory usage jumps from 0% to 100% in less than 2 seconds and I get the following error:

terminate called after throwing an instance of 'faiss::FaissException'
  what():  Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /faiss/faiss/gpu/StandardGpuResources.cpp:410: Error: 'err == cudaSuccess' failed: Failed to cudaMalloc 1500000000 bytes on device 1 (error 2 out of memory
Outstanding allocations:
Alloc type FlatData: 2 allocations, 475264 bytes
Alloc type TemporaryMemoryBuffer: 1 allocations, 536870912 bytes
Alloc type Other: 5 allocations, 102200000 bytes
Alloc type IVFLists: 632 allocations, 217884672 bytes

Aborted

This looks like a memory leak? I've installed faiss and tsnecuda in conda using conda install -c CannyLab -c pytorch tsnecuda, and the test ran without a problem. This problem happens in cuda 10.1 and cuda 10.2. I've tried tsnecuda a few months ago (somewhere in may I believe), and it worked fine then.

DavidMChan commented 3 years ago

Hmm - it looks like we're running out of memory in FAISS (it's unlikely to be a memory leak). Do you have multiple GPUs on this machine (with one that might be a bit smaller)? (by default tsne-cuda now tries to allocate the search on both devices)

DradeAW commented 3 years ago

I do have 2 GPUs (one only for video output, and the RTX 2080 for computations).

However, when I checked, all 8 GB of the RTX 2080 were being used right before the crash (and none from the other GPU), which is why I didn't think the problem came from there.

Also I tried running the same code but with 40,000 instead of 100,000 and it runs (but ideally I would like to run it with 300,000).

DavidMChan commented 3 years ago

Can you try running the code with CUDA_VISIBLE_DEVICES=X (where X is the device identifier from nvidia-smi corresponding to the 2080)? Because we use a mirrored split, if you're not careful it will try to put the full NN map on both GPUs (regardless of memory availability)

DradeAW commented 3 years ago

Ah it solved the issue, thanks!

Something weird happened actually, here is the ouput of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro K620         On   | 00000000:03:00.0  On |                  N/A |
| 44%   53C    P8     1W /  30W |    591MiB /  1979MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:73:00.0 Off |                  N/A |
| 34%   39C    P8    12W / 215W |      6MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

But when I set CUDA_VISIBLE_DEVICES=1 it actually ran on the Quadro K620. I switched it to =0 and now it runs on the RTX 2080. It now works with n=300000 (it seems to take all the memory it can, but does not crash).

Thank you!

CannyLab / tsne-cuda

Memory leak using fit_transform #108