Closed DradeAW closed 3 years ago
Hmm - it looks like we're running out of memory in FAISS (it's unlikely to be a memory leak). Do you have multiple GPUs on this machine (with one that might be a bit smaller)? (by default tsne-cuda now tries to allocate the search on both devices)
I do have 2 GPUs (one only for video output, and the RTX 2080 for computations).
However, when I checked, all 8 GB of the RTX 2080 were being used right before the crash (and none from the other GPU), which is why I didn't think the problem came from there.
Also I tried running the same code but with 40,000 instead of 100,000 and it runs (but ideally I would like to run it with 300,000).
Can you try running the code with CUDA_VISIBLE_DEVICES=X (where X is the device identifier from nvidia-smi
corresponding to the 2080)? Because we use a mirrored split, if you're not careful it will try to put the full NN map on both GPUs (regardless of memory availability)
Ah it solved the issue, thanks!
Something weird happened actually, here is the ouput of nvidia-smi
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro K620 On | 00000000:03:00.0 On | N/A |
| 44% 53C P8 1W / 30W | 591MiB / 1979MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:73:00.0 Off | N/A |
| 34% 39C P8 12W / 215W | 6MiB / 7982MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
But when I set CUDA_VISIBLE_DEVICES=1
it actually ran on the Quadro K620. I switched it to =0 and now it runs on the RTX 2080.
It now works with n=300000 (it seems to take all the memory it can, but does not crash).
Thank you!
Hi,
I've been trying to use
tsnecuda
on my dataset, but I keep getting memory errors even though I'm using relatively small dataset.My array is a
100000x375
ofint16
(= 72MB), and I'm running the software on a RTX 2080 8GB. When runningTSNE(n_components=2).fit_transform(data)
, the GPU memory usage jumps from 0% to 100% in less than 2 seconds and I get the following error:This looks like a memory leak? I've installed faiss and tsnecuda in conda using
conda install -c CannyLab -c pytorch tsnecuda
, and the test ran without a problem. This problem happens in cuda 10.1 and cuda 10.2. I've tried tsnecuda a few months ago (somewhere in may I believe), and it worked fine then.