Closed ProGamerGov closed 4 years ago
@ajhool Those were the values used in the original neural-style. Theoretically it should work as my code mirrors how neural-style did things.
Upon testing I am seeing utilization of GPU 0, even when I have only selected GPUs 1, 2, and 3. I am not sure if this behavior is to be expected.
Here are the results of my multi-gpu experiments with 8 Tesla K80s and different multi-device strategies:
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4,5,6,7 -multidevice_strategy 2,4,6,8,10,12,14
Sat Sep 21 18:34:05 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 55C P0 68W / 149W | 1009MiB / 11441MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 50C P0 79W / 149W | 751MiB / 11441MiB | 27% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 53C P0 60W / 149W | 514MiB / 11441MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 49C P0 73W / 149W | 509MiB / 11441MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 57C P0 62W / 149W | 569MiB / 11441MiB | 12% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 47C P0 70W / 149W | 426MiB / 11441MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 56C P0 62W / 149W | 443MiB / 11441MiB | 5% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 64C P0 111W / 149W | 697MiB / 11441MiB | 21% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 51311 C python3 996MiB |
| 1 51311 C python3 738MiB |
| 2 51311 C python3 501MiB |
| 3 51311 C python3 496MiB |
| 4 51311 C python3 556MiB |
| 5 51311 C python3 413MiB |
| 6 51311 C python3 430MiB |
| 7 51311 C python3 684MiB |
+-----------------------------------------------------------------------------+
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0
Sat Sep 21 18:35:42 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 68C P0 140W / 149W | 1267MiB / 11441MiB | 89% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 41C P8 32W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 48C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 41C P8 32W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 49C P8 28W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 41C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 53C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 47C P8 31W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 51451 C python3 1256MiB |
+-----------------------------------------------------------------------------+
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3 -multidevice_strategy 4,7,29
Sat Sep 21 18:37:41 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 58C P0 70W / 149W | 1295MiB / 11441MiB | 10% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 49C P0 90W / 149W | 591MiB / 11441MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 60C P0 113W / 149W | 849MiB / 11441MiB | 32% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 48C P0 87W / 149W | 537MiB / 11441MiB | 15% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 46C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 39C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 47C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 41C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 51563 C python3 1284MiB |
| 1 51563 C python3 580MiB |
| 2 51563 C python3 838MiB |
| 3 51563 C python3 526MiB |
+-----------------------------------------------------------------------------+
The -backward_device
parameter was just me testing the impact of using .to(device)
in the feval()
function, so that I could add the loss values and then run backward()
on them. By default device 0 is the backward device. There didn't appear to be really any memory usage increase on any device that I made the backward device.
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3 -multidevice_strategy 2,7,29 -backward_device 3
Sat Sep 21 18:41:54 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 56C P0 72W / 149W | 1295MiB / 11441MiB | 45% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 49C P0 76W / 149W | 591MiB / 11441MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 63C P0 120W / 149W | 873MiB / 11441MiB | 31% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 48C P0 72W / 149W | 537MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 43C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 37C P8 29W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 44C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 40C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 51699 C python3 1284MiB |
| 1 51699 C python3 580MiB |
| 2 51699 C python3 862MiB |
| 3 51699 C python3 526MiB |
+-----------------------------------------------------------------------------+
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3 -multidevice_strategy 2,7,29
Sat Sep 21 18:44:44 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 57C P0 66W / 149W | 1007MiB / 11441MiB | 26% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 51C P0 80W / 149W | 941MiB / 11441MiB | 16% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 68C P0 139W / 149W | 813MiB / 11441MiB | 55% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 50C P0 77W / 149W | 537MiB / 11441MiB | 18% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 43C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 37C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 43C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 39C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 51823 C python3 996MiB |
| 1 51823 C python3 930MiB |
| 2 51823 C python3 802MiB |
| 3 51823 C python3 526MiB |
+-----------------------------------------------------------------------------+
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3 -multidevice_strategy 1,7,29
Sat Sep 21 18:47:26 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 58C P0 71W / 149W | 951MiB / 11441MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 52C P0 84W / 149W | 941MiB / 11441MiB | 38% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 69C P0 105W / 149W | 801MiB / 11441MiB | 44% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 50C P0 73W / 149W | 537MiB / 11441MiB | 19% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 42C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 36C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 42C P8 26W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 39C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 51970 C python3 940MiB |
| 1 51970 C python3 930MiB |
| 2 51970 C python3 790MiB |
| 3 51970 C python3 526MiB |
+-----------------------------------------------------------------------------+
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2,3 -multidevice_strategy 1,23 -backward_device 0
Sat Sep 21 18:49:58 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 57C P0 63W / 149W | 899MiB / 11441MiB | 7% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 50C P0 72W / 149W | 407MiB / 11441MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 71C P0 115W / 149W | 1277MiB / 11441MiB | 56% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 52C P0 92W / 149W | 579MiB / 11441MiB | 46% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 42C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 37C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 43C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 39C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 52099 C python3 888MiB |
| 1 52099 C python3 396MiB |
| 2 52099 C python3 1266MiB |
| 3 52099 C python3 568MiB |
+-----------------------------------------------------------------------------+
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2,3 -multidevice_strategy 1,23
Sat Sep 21 18:51:46 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 55C P0 63W / 149W | 899MiB / 11441MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 49C P0 73W / 149W | 407MiB / 11441MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 66C P0 115W / 149W | 1277MiB / 11441MiB | 54% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 51C P0 82W / 149W | 585MiB / 11441MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 43C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 37C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 43C P8 26W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 39C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 52230 C python3 888MiB |
| 1 52230 C python3 396MiB |
| 2 52230 C python3 1266MiB |
| 3 52230 C python3 574MiB |
+-----------------------------------------------------------------------------+
This is the same parameters on a single K80:
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0
Sun Sep 22 17:56:55 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 84C P0 134W / 149W | 1267MiB / 11441MiB | 88% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2040 C python3 1256MiB |
+-----------------------------------------------------------------------------+
It seems that Nvidia has CUDA take up a certain amount of memory by default, but I'm not sure if that can explain the behavior that I am seeing.
The layer setup for the above experiments was:
(1): nn.TVLoss
(2): nn.Conv2d(3 -> 64, 3x3, 1,1, 1,1)
(3): nn.ReLU
(4): nn.StyleLoss
(5): nn.Conv2d(64 -> 64, 3x3, 1,1, 1,1)
(6): nn.ReLU
(7): nn.MaxPool2d(2x2, 2,2)
(8): nn.Conv2d(64 -> 128, 3x3, 1,1, 1,1)
(9): nn.ReLU
(10): nn.StyleLoss
(11): nn.Conv2d(128 -> 128, 3x3, 1,1, 1,1)
(12): nn.ReLU
(13): nn.MaxPool2d(2x2, 2,2)
(14): nn.Conv2d(128 -> 256, 3x3, 1,1, 1,1)
(15): nn.ReLU
(16): nn.StyleLoss
(17): nn.Conv2d(256 -> 256, 3x3, 1,1, 1,1)
(18): nn.ReLU
(19): nn.Conv2d(256 -> 256, 3x3, 1,1, 1,1)
(20): nn.ReLU
(21): nn.Conv2d(256 -> 256, 3x3, 1,1, 1,1)
(22): nn.ReLU
(23): nn.MaxPool2d(2x2, 2,2)
(24): nn.Conv2d(256 -> 512, 3x3, 1,1, 1,1)
(25): nn.ReLU
(26): nn.StyleLoss
(27): nn.Conv2d(512 -> 512, 3x3, 1,1, 1,1)
(28): nn.ReLU
(29): nn.ContentLoss
(30): nn.Conv2d(512 -> 512, 3x3, 1,1, 1,1)
(31): nn.ReLU
(32): nn.Conv2d(512 -> 512, 3x3, 1,1, 1,1)
(33): nn.ReLU
(34): nn.MaxPool2d(2x2, 2,2)
(35): nn.Conv2d(512 -> 512, 3x3, 1,1, 1,1)
(36): nn.ReLU
(37): nn.StyleLoss
)
So, whatever device backward()
is run on, uses 334MiB / 11441MiB of GPU memory, regardless of the parameters used.
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4 -multidevice_strategy 4,10,16,28 -image_size 1536
Sun Sep 22 20:15:56 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 64C P0 67W / 149W | 10309MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 50C P0 148W / 149W | 2959MiB / 11441MiB | 44% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 65C P0 90W / 149W | 1725MiB / 11441MiB | 92% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 53C P0 81W / 149W | 1965MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 53C P0 60W / 149W | 1115MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 32C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 39C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 33C P8 29W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 4226 C python3 10296MiB |
| 1 4226 C python3 2946MiB |
| 2 4226 C python3 1712MiB |
| 3 4226 C python3 1952MiB |
| 4 4226 C python3 1102MiB |
+-----------------------------------------------------------------------------+
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4,5 -multidevice_strategy 1,4,10,16,28 -image_size 1536
Sun Sep 22 20:27:22 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 61C P0 70W / 149W | 5549MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 48C P0 79W / 149W | 4015MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 65C P0 71W / 149W | 2959MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 52C P0 115W / 149W | 1725MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 65C P0 134W / 149W | 1991MiB / 11441MiB | 58% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 50C P0 79W / 149W | 1121MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 38C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 32C P8 28W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 4463 C python3 5536MiB |
| 1 4463 C python3 4002MiB |
| 2 4463 C python3 2946MiB |
| 3 4463 C python3 1712MiB |
| 4 4463 C python3 1978MiB |
| 5 4463 C python3 1108MiB |
+-----------------------------------------------------------------------------+
backward device ('cuda:7') python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2,3,4,5,6 -multidevice_strategy 1,4,10,16,28 -image_size 1536
Sun Sep 22 20:43:00 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 55C P0 67W / 149W | 4973MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 42C P0 69W / 149W | 883MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 64C P0 71W / 149W | 4015MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 51C P0 83W / 149W | 2959MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 66C P0 71W / 149W | 1725MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 54C P0 145W / 149W | 1965MiB / 11441MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 63C P0 68W / 149W | 1115MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 46C P0 68W / 149W | 336MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 5033 C python3 4960MiB |
| 1 5033 C python3 870MiB |
| 2 5033 C python3 4002MiB |
| 3 5033 C python3 2946MiB |
| 4 5033 C python3 1712MiB |
| 5 5033 C python3 1952MiB |
| 6 5033 C python3 1102MiB |
| 7 5033 C python3 323MiB |
+-----------------------------------------------------------------------------+
removed a .to(device) from the ModelParallel Class - backward device ('cuda:7') python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4,5 -multidevice_strategy 1,4,10,16,28 -image_size 1536
Sun Sep 22 20:51:01 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 56C P0 68W / 149W | 4973MiB / 11441MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 42C P0 80W / 149W | 883MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 64C P0 98W / 149W | 4015MiB / 11441MiB | 48% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 52C P0 85W / 149W | 2959MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 66C P0 68W / 149W | 1725MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 53C P0 77W / 149W | 1965MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 62C P0 63W / 149W | 1115MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 46C P0 68W / 149W | 336MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 5310 C python3 4960MiB |
| 1 5310 C python3 870MiB |
| 2 5310 C python3 4002MiB |
| 3 5310 C python3 2946MiB |
| 4 5310 C python3 1712MiB |
| 5 5310 C python3 1952MiB |
| 6 5310 C python3 1102MiB |
| 7 5310 C python3 323MiB |
+-----------------------------------------------------------------------------+
If I use more than one GPU, then GPU:0 starts being used.
backward device ('cuda:7') - removed a .to(device) from the ModelParallel Class - python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2 -image_size 256 -multidevice_strategy 12
Sun Sep 22 21:16:08 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 51C P0 60W / 149W | 539MiB / 11441MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 43C P0 79W / 149W | 527MiB / 11441MiB | 29% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 53C P0 75W / 149W | 563MiB / 11441MiB | 51% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 33C P8 32W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 40C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 34C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 46C P8 28W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 44C P0 67W / 149W | 334MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 6741 C python3 528MiB |
| 1 6741 C python3 516MiB |
| 2 6741 C python3 552MiB |
| 7 6741 C python3 323MiB |
+-----------------------------------------------------------------------------+
backward device ('cuda:7') - removed a .to(device) from the ModelParallel Class - python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2 -image_size 512 -multidevice_strategy 12
Sun Sep 22 21:19:17 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 54C P0 62W / 149W | 899MiB / 11441MiB | 17% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 47C P0 85W / 149W | 1115MiB / 11441MiB | 39% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 63C P0 109W / 149W | 709MiB / 11441MiB | 34% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 33C P8 31W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 38C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 33C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 45C P8 28W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 44C P0 67W / 149W | 334MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 6889 C python3 888MiB |
| 1 6889 C python3 1104MiB |
| 2 6889 C python3 698MiB |
| 7 6889 C python3 323MiB |
+-----------------------------------------------------------------------------+
I think it's possible that these lines may be responsible for some of the issue:
content_image = preprocess(params.content_image, params.image_size).type(dtype)
img_caffe = preprocess(image, style_size).type(dtype)
init_image = preprocess(params.init_image, image_size).type(dtype)
tv_mod = TVLoss(params.tv_weight).type(dtype)
img = torch.randn(C, H, W).mul(0.001).unsqueeze(0).type(dtype)
img = nn.Parameter(img.type(dtype))
Because the dtype variable is a CUDA tensor which exists on what I presume is GPU:0
dtype = torch.cuda.FloatTensor
The model is placed on GPU:0 initially I think as well:
cnn = cnn.cuda()
But I believe that version of the model is replaced by the multigpu version while the other model copies/chunks are cleaned up by Python:
net = setup_multi_device(net)
I tried adding this right before the feval()
function:
del content_image, img_caffe, tv_mod
And this was the result:
python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4 -multidevice_strategy 4,10,16,28 -image_size 1536
Thu Sep 26 01:17:58 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 60C P0 71W / 149W | 10309MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 47C P0 82W / 149W | 2959MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 64C P0 86W / 149W | 1725MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 52C P0 152W / 149W | 1965MiB / 11441MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 48C P0 65W / 149W | 1109MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 29C P8 32W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 41C P8 27W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 42C P0 70W / 149W | 336MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2407 C python3 10296MiB |
| 1 2407 C python3 2946MiB |
| 2 2407 C python3 1712MiB |
| 3 2407 C python3 1952MiB |
| 4 2407 C python3 1096MiB |
| 7 2407 C python3 323MiB |
+-----------------------------------------------------------------------------+
Putting the model on cuda:6 instead of just making it CUDA, somehow increased GPU usage:
Thu Sep 26 01:29:25 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 64C P0 134W / 149W | 10351MiB / 11441MiB | 84% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 49C P0 97W / 149W | 2959MiB / 11441MiB | 48% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 66C P0 71W / 149W | 1725MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 53C P0 78W / 149W | 1965MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 49C P0 61W / 149W | 1109MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 30C P8 32W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 53C P0 60W / 149W | 400MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 43C P0 71W / 149W | 336MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2803 C python3 10338MiB |
| 1 2803 C python3 2946MiB |
| 2 2803 C python3 1712MiB |
| 3 2803 C python3 1952MiB |
| 4 2803 C python3 1096MiB |
| 6 2803 C python3 387MiB |
| 7 2803 C python3 323MiB |
+-----------------------------------------------------------------------------+
Decreasing the number of GPUs used doesn't change the total usage:
Thu Sep 26 01:40:11 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 62C P0 77W / 149W | 10351MiB / 11441MiB | 23% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 50C P0 82W / 149W | 4269MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 58C P0 160W / 149W | 2179MiB / 11441MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 30C P8 30W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 32C P8 26W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 29C P8 31W / 149W | 11MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 51C P0 59W / 149W | 400MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 42C P0 71W / 149W | 336MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3140 C python3 10338MiB |
| 1 3140 C python3 4256MiB |
| 2 3140 C python3 2166MiB |
| 6 3140 C python3 387MiB |
| 7 3140 C python3 323MiB |
+-----------------------------------------------------------------------------+
I got CPU support working now, so that GPUs and the CPU can be used as devices together. An interesting thing you can do with the code, put a single layer on the CPU while the rest of the model is on a single GPU:
python3 neural_style.py -gpu c,0,c -image_size 256 -multidevice_strategy 1,34
The CPU is a lot slower than the GPU, but there are use cases where you need to offload some usage from your GPU(s).
I have resolved the memory issue with GPU:0
You can now use multiple GPUs in the same way that you could in the original neural-style.
The
-multigpu_strategy
parameter was renamed to-multidevice_strategy
.https://github.com/ProGamerGov/neural-style-pt/issues/2
You can use any combination of GPUs and your CPU as devices.
New
-disable_check
parameter for advanced users.AMD GPU support.