Closed y5wang closed 4 years ago
I'm training on S3DIS dataset, on 2080ti(11GB) it also occurred the error "cuda out of memory", It seems that changing the batch size won't reduce GPU memory usage.
But it runs well on tesla p100 (12GB).
@chrischoy
Guess Tesla K80 doesn't have enough memory. It has 11441MB as reported by nvidia-smi
:
(st-segmentation) yang@microway-gpu-ubuntu:~/projects/SpatioTemporalSegmentation$ nvidia-smi
Wed Mar 11 15:08:31 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:06:00.0 Off | 0 |
| N/A 34C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 00000000:07:00.0 Off | 0 |
| N/A 24C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 On | 00000000:0A:00.0 Off | 0 |
| N/A 30C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 On | 00000000:0B:00.0 Off | 0 |
| N/A 24C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 On | 00000000:10:00.0 Off | 0 |
| N/A 32C P8 25W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 On | 00000000:11:00.0 Off | 0 |
| N/A 22C P8 30W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 On | 00000000:14:00.0 Off | 0 |
| N/A 32C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 On | 00000000:15:00.0 Off | 0 |
| N/A 22C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+```
This is not a bug. OOM states that you need more memory. Please lower the batch size to run it on your environment.
Trying to train ScanNet scene segmentation, but run into CUDA out of memory error. My environment:
The training was started by:
export BATCH_SIZE=8; ./scripts/train_scannet.sh 2 -default "--scannet_path ./data/scannet/train"
(I've tried BATCH_SIZE=32, and BATCH_SIZE=16, both failed)Here is the error dump:
Any help is appreciated.
-- Yang