Added support for Multi-GPU and CPU

ProGamerGov commented 4 years ago

You can now use multiple GPUs in the same way that you could in the original neural-style.
The -multigpu_strategy parameter was renamed to -multidevice_strategy.
https://github.com/ProGamerGov/neural-style-pt/issues/2
You can use any combination of GPUs and your CPU as devices.

New -disable_check parameter for advanced users.
AMD GPU support.

ProGamerGov commented 4 years ago

@ajhool Those were the values used in the original neural-style. Theoretically it should work as my code mirrors how neural-style did things.

Upon testing I am seeing utilization of GPU 0, even when I have only selected GPUs 1, 2, and 3. I am not sure if this behavior is to be expected.

ProGamerGov commented 4 years ago

Here are the results of my multi-gpu experiments with 8 Tesla K80s and different multi-device strategies:

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4,5,6,7 -multidevice_strategy 2,4,6,8,10,12,14

Sat Sep 21 18:34:05 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   55C    P0    68W / 149W |   1009MiB / 11441MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   50C    P0    79W / 149W |    751MiB / 11441MiB |     27%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   53C    P0    60W / 149W |    514MiB / 11441MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   49C    P0    73W / 149W |    509MiB / 11441MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   57C    P0    62W / 149W |    569MiB / 11441MiB |     12%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   47C    P0    70W / 149W |    426MiB / 11441MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   56C    P0    62W / 149W |    443MiB / 11441MiB |      5%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   64C    P0   111W / 149W |    697MiB / 11441MiB |     21%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     51311      C   python3                                      996MiB |
|    1     51311      C   python3                                      738MiB |
|    2     51311      C   python3                                      501MiB |
|    3     51311      C   python3                                      496MiB |
|    4     51311      C   python3                                      556MiB |
|    5     51311      C   python3                                      413MiB |
|    6     51311      C   python3                                      430MiB |
|    7     51311      C   python3                                      684MiB |
+-----------------------------------------------------------------------------+

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0

Sat Sep 21 18:35:42 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   68C    P0   140W / 149W |   1267MiB / 11441MiB |     89%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   41C    P8    32W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   48C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   41C    P8    32W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   49C    P8    28W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   41C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   53C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   47C    P8    31W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     51451      C   python3                                     1256MiB |
+-----------------------------------------------------------------------------+

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3 -multidevice_strategy 4,7,29

Sat Sep 21 18:37:41 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   58C    P0    70W / 149W |   1295MiB / 11441MiB |     10%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   49C    P0    90W / 149W |    591MiB / 11441MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   60C    P0   113W / 149W |    849MiB / 11441MiB |     32%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   48C    P0    87W / 149W |    537MiB / 11441MiB |     15%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   46C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   39C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   47C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   41C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     51563      C   python3                                     1284MiB |
|    1     51563      C   python3                                      580MiB |
|    2     51563      C   python3                                      838MiB |
|    3     51563      C   python3                                      526MiB |
+-----------------------------------------------------------------------------+

The -backward_device parameter was just me testing the impact of using .to(device) in the feval() function, so that I could add the loss values and then run backward() on them. By default device 0 is the backward device. There didn't appear to be really any memory usage increase on any device that I made the backward device.

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3  -multidevice_strategy 2,7,29 -backward_device 3

Sat Sep 21 18:41:54 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   56C    P0    72W / 149W |   1295MiB / 11441MiB |     45%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   49C    P0    76W / 149W |    591MiB / 11441MiB |     11%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   63C    P0   120W / 149W |    873MiB / 11441MiB |     31%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   48C    P0    72W / 149W |    537MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   43C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   37C    P8    29W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   44C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   40C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     51699      C   python3                                     1284MiB |
|    1     51699      C   python3                                      580MiB |
|    2     51699      C   python3                                      862MiB |
|    3     51699      C   python3                                      526MiB |
+-----------------------------------------------------------------------------+

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3 -multidevice_strategy 2,7,29

Sat Sep 21 18:44:44 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   57C    P0    66W / 149W |   1007MiB / 11441MiB |     26%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   51C    P0    80W / 149W |    941MiB / 11441MiB |     16%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   68C    P0   139W / 149W |    813MiB / 11441MiB |     55%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   50C    P0    77W / 149W |    537MiB / 11441MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   43C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   37C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   43C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   39C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     51823      C   python3                                      996MiB |
|    1     51823      C   python3                                      930MiB |
|    2     51823      C   python3                                      802MiB |
|    3     51823      C   python3                                      526MiB |
+-----------------------------------------------------------------------------+

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3 -multidevice_strategy 1,7,29

Sat Sep 21 18:47:26 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   58C    P0    71W / 149W |    951MiB / 11441MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   52C    P0    84W / 149W |    941MiB / 11441MiB |     38%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   69C    P0   105W / 149W |    801MiB / 11441MiB |     44%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   50C    P0    73W / 149W |    537MiB / 11441MiB |     19%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   42C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   36C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   42C    P8    26W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   39C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     51970      C   python3                                      940MiB |
|    1     51970      C   python3                                      930MiB |
|    2     51970      C   python3                                      790MiB |
|    3     51970      C   python3                                      526MiB |
+-----------------------------------------------------------------------------+

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2,3 -multidevice_strategy 1,23 -backward_device 0

Sat Sep 21 18:49:58 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   57C    P0    63W / 149W |    899MiB / 11441MiB |      7%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   50C    P0    72W / 149W |    407MiB / 11441MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   71C    P0   115W / 149W |   1277MiB / 11441MiB |     56%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   52C    P0    92W / 149W |    579MiB / 11441MiB |     46%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   42C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   37C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   43C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   39C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     52099      C   python3                                      888MiB |
|    1     52099      C   python3                                      396MiB |
|    2     52099      C   python3                                     1266MiB |
|    3     52099      C   python3                                      568MiB |
+-----------------------------------------------------------------------------+

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2,3 -multidevice_strategy 1,23

Sat Sep 21 18:51:46 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   55C    P0    63W / 149W |    899MiB / 11441MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   49C    P0    73W / 149W |    407MiB / 11441MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   66C    P0   115W / 149W |   1277MiB / 11441MiB |     54%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   51C    P0    82W / 149W |    585MiB / 11441MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   43C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   37C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   43C    P8    26W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   39C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     52230      C   python3                                      888MiB |
|    1     52230      C   python3                                      396MiB |
|    2     52230      C   python3                                     1266MiB |
|    3     52230      C   python3                                      574MiB |
+-----------------------------------------------------------------------------+

This is the same parameters on a single K80:

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0

Sun Sep 22 17:56:55 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   84C    P0   134W / 149W |   1267MiB / 11441MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2040      C   python3                                     1256MiB |
+-----------------------------------------------------------------------------+

It seems that Nvidia has CUDA take up a certain amount of memory by default, but I'm not sure if that can explain the behavior that I am seeing.

https://github.com/pytorch/pytorch/issues/20532

ProGamerGov commented 4 years ago

The layer setup for the above experiments was:

  (1): nn.TVLoss
  (2): nn.Conv2d(3 -> 64, 3x3, 1,1, 1,1)
  (3): nn.ReLU
  (4): nn.StyleLoss
  (5): nn.Conv2d(64 -> 64, 3x3, 1,1, 1,1)
  (6): nn.ReLU
  (7): nn.MaxPool2d(2x2, 2,2)
  (8): nn.Conv2d(64 -> 128, 3x3, 1,1, 1,1)
  (9): nn.ReLU
  (10): nn.StyleLoss
  (11): nn.Conv2d(128 -> 128, 3x3, 1,1, 1,1)
  (12): nn.ReLU
  (13): nn.MaxPool2d(2x2, 2,2)
  (14): nn.Conv2d(128 -> 256, 3x3, 1,1, 1,1)
  (15): nn.ReLU
  (16): nn.StyleLoss
  (17): nn.Conv2d(256 -> 256, 3x3, 1,1, 1,1)
  (18): nn.ReLU
  (19): nn.Conv2d(256 -> 256, 3x3, 1,1, 1,1)
  (20): nn.ReLU
  (21): nn.Conv2d(256 -> 256, 3x3, 1,1, 1,1)
  (22): nn.ReLU
  (23): nn.MaxPool2d(2x2, 2,2)
  (24): nn.Conv2d(256 -> 512, 3x3, 1,1, 1,1)
  (25): nn.ReLU
  (26): nn.StyleLoss
  (27): nn.Conv2d(512 -> 512, 3x3, 1,1, 1,1)
  (28): nn.ReLU
  (29): nn.ContentLoss
  (30): nn.Conv2d(512 -> 512, 3x3, 1,1, 1,1)
  (31): nn.ReLU
  (32): nn.Conv2d(512 -> 512, 3x3, 1,1, 1,1)
  (33): nn.ReLU
  (34): nn.MaxPool2d(2x2, 2,2)
  (35): nn.Conv2d(512 -> 512, 3x3, 1,1, 1,1)
  (36): nn.ReLU
  (37): nn.StyleLoss
)

ProGamerGov commented 4 years ago

So, whatever device backward() is run on, uses 334MiB / 11441MiB of GPU memory, regardless of the parameters used.

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4 -multidevice_strategy 4,10,16,28 -image_size 1536

Sun Sep 22 20:15:56 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   64C    P0    67W / 149W |  10309MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   50C    P0   148W / 149W |   2959MiB / 11441MiB |     44%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   65C    P0    90W / 149W |   1725MiB / 11441MiB |     92%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   53C    P0    81W / 149W |   1965MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   53C    P0    60W / 149W |   1115MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   32C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   39C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   33C    P8    29W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      4226      C   python3                                    10296MiB |
|    1      4226      C   python3                                     2946MiB |
|    2      4226      C   python3                                     1712MiB |
|    3      4226      C   python3                                     1952MiB |
|    4      4226      C   python3                                     1102MiB |
+-----------------------------------------------------------------------------+

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4,5 -multidevice_strategy 1,4,10,16,28 -image_size 1536

Sun Sep 22 20:27:22 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   61C    P0    70W / 149W |   5549MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   48C    P0    79W / 149W |   4015MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   65C    P0    71W / 149W |   2959MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   52C    P0   115W / 149W |   1725MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   65C    P0   134W / 149W |   1991MiB / 11441MiB |     58%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   50C    P0    79W / 149W |   1121MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   38C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   32C    P8    28W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      4463      C   python3                                     5536MiB |
|    1      4463      C   python3                                     4002MiB |
|    2      4463      C   python3                                     2946MiB |
|    3      4463      C   python3                                     1712MiB |
|    4      4463      C   python3                                     1978MiB |
|    5      4463      C   python3                                     1108MiB |
+-----------------------------------------------------------------------------+

backward device ('cuda:7') python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2,3,4,5,6 -multidevice_strategy 1,4,10,16,28 -image_size 1536

Sun Sep 22 20:43:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   55C    P0    67W / 149W |   4973MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   42C    P0    69W / 149W |    883MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   64C    P0    71W / 149W |   4015MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   51C    P0    83W / 149W |   2959MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   66C    P0    71W / 149W |   1725MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   54C    P0   145W / 149W |   1965MiB / 11441MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   63C    P0    68W / 149W |   1115MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   46C    P0    68W / 149W |    336MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      5033      C   python3                                     4960MiB |
|    1      5033      C   python3                                      870MiB |
|    2      5033      C   python3                                     4002MiB |
|    3      5033      C   python3                                     2946MiB |
|    4      5033      C   python3                                     1712MiB |
|    5      5033      C   python3                                     1952MiB |
|    6      5033      C   python3                                     1102MiB |
|    7      5033      C   python3                                      323MiB |
+-----------------------------------------------------------------------------+

removed a .to(device) from the ModelParallel Class - backward device ('cuda:7') python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4,5 -multidevice_strategy 1,4,10,16,28 -image_size 1536

Sun Sep 22 20:51:01 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   56C    P0    68W / 149W |   4973MiB / 11441MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   42C    P0    80W / 149W |    883MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   64C    P0    98W / 149W |   4015MiB / 11441MiB |     48%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   52C    P0    85W / 149W |   2959MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   66C    P0    68W / 149W |   1725MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   53C    P0    77W / 149W |   1965MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   62C    P0    63W / 149W |   1115MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   46C    P0    68W / 149W |    336MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      5310      C   python3                                     4960MiB |
|    1      5310      C   python3                                      870MiB |
|    2      5310      C   python3                                     4002MiB |
|    3      5310      C   python3                                     2946MiB |
|    4      5310      C   python3                                     1712MiB |
|    5      5310      C   python3                                     1952MiB |
|    6      5310      C   python3                                     1102MiB |
|    7      5310      C   python3                                      323MiB |
+-----------------------------------------------------------------------------+

If I use more than one GPU, then GPU:0 starts being used.

backward device ('cuda:7') - removed a .to(device) from the ModelParallel Class - python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2 -image_size 256 -multidevice_strategy 12

Sun Sep 22 21:16:08 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   51C    P0    60W / 149W |    539MiB / 11441MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   43C    P0    79W / 149W |    527MiB / 11441MiB |     29%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   53C    P0    75W / 149W |    563MiB / 11441MiB |     51%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   33C    P8    32W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   40C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   34C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   46C    P8    28W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   44C    P0    67W / 149W |    334MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      6741      C   python3                                      528MiB |
|    1      6741      C   python3                                      516MiB |
|    2      6741      C   python3                                      552MiB |
|    7      6741      C   python3                                      323MiB |
+-----------------------------------------------------------------------------+

backward device ('cuda:7') - removed a .to(device) from the ModelParallel Class - python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 1,2 -image_size 512 -multidevice_strategy 12

Sun Sep 22 21:19:17 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   54C    P0    62W / 149W |    899MiB / 11441MiB |     17%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   47C    P0    85W / 149W |   1115MiB / 11441MiB |     39%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   63C    P0   109W / 149W |    709MiB / 11441MiB |     34%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   33C    P8    31W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   38C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   33C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   45C    P8    28W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   44C    P0    67W / 149W |    334MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      6889      C   python3                                      888MiB |
|    1      6889      C   python3                                     1104MiB |
|    2      6889      C   python3                                      698MiB |
|    7      6889      C   python3                                      323MiB |
+-----------------------------------------------------------------------------+

ProGamerGov commented 4 years ago

I think it's possible that these lines may be responsible for some of the issue:

content_image = preprocess(params.content_image, params.image_size).type(dtype)

img_caffe = preprocess(image, style_size).type(dtype)

init_image = preprocess(params.init_image, image_size).type(dtype)

tv_mod = TVLoss(params.tv_weight).type(dtype)

img = torch.randn(C, H, W).mul(0.001).unsqueeze(0).type(dtype)

img = nn.Parameter(img.type(dtype))

Because the dtype variable is a CUDA tensor which exists on what I presume is GPU:0

dtype = torch.cuda.FloatTensor

The model is placed on GPU:0 initially I think as well:

cnn = cnn.cuda()

But I believe that version of the model is replaced by the multigpu version while the other model copies/chunks are cleaned up by Python:

net = setup_multi_device(net)

ProGamerGov commented 4 years ago

I tried adding this right before the feval() function:

    del content_image, img_caffe, tv_mod

And this was the result:

python3 neural_style.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -gpu 0,1,2,3,4 -multidevice_strategy 4,10,16,28 -image_size 1536

Thu Sep 26 01:17:58 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   60C    P0    71W / 149W |  10309MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   47C    P0    82W / 149W |   2959MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   64C    P0    86W / 149W |   1725MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   52C    P0   152W / 149W |   1965MiB / 11441MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   48C    P0    65W / 149W |   1109MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   29C    P8    32W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   41C    P8    27W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   42C    P0    70W / 149W |    336MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2407      C   python3                                    10296MiB |
|    1      2407      C   python3                                     2946MiB |
|    2      2407      C   python3                                     1712MiB |
|    3      2407      C   python3                                     1952MiB |
|    4      2407      C   python3                                     1096MiB |
|    7      2407      C   python3                                      323MiB |
+-----------------------------------------------------------------------------+

Putting the model on cuda:6 instead of just making it CUDA, somehow increased GPU usage:

Thu Sep 26 01:29:25 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   64C    P0   134W / 149W |  10351MiB / 11441MiB |     84%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   49C    P0    97W / 149W |   2959MiB / 11441MiB |     48%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   66C    P0    71W / 149W |   1725MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   53C    P0    78W / 149W |   1965MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   49C    P0    61W / 149W |   1109MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   30C    P8    32W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   53C    P0    60W / 149W |    400MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   43C    P0    71W / 149W |    336MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2803      C   python3                                    10338MiB |
|    1      2803      C   python3                                     2946MiB |
|    2      2803      C   python3                                     1712MiB |
|    3      2803      C   python3                                     1952MiB |
|    4      2803      C   python3                                     1096MiB |
|    6      2803      C   python3                                      387MiB |
|    7      2803      C   python3                                      323MiB |
+-----------------------------------------------------------------------------+

Decreasing the number of GPUs used doesn't change the total usage:

Thu Sep 26 01:40:11 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   62C    P0    77W / 149W |  10351MiB / 11441MiB |     23%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   50C    P0    82W / 149W |   4269MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   58C    P0   160W / 149W |   2179MiB / 11441MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   30C    P8    30W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   32C    P8    26W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   29C    P8    31W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   51C    P0    59W / 149W |    400MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   42C    P0    71W / 149W |    336MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      3140      C   python3                                    10338MiB |
|    1      3140      C   python3                                     4256MiB |
|    2      3140      C   python3                                     2166MiB |
|    6      3140      C   python3                                      387MiB |
|    7      3140      C   python3                                      323MiB |
+-----------------------------------------------------------------------------+

ProGamerGov commented 4 years ago

Some related issues:

https://github.com/pytorch/pytorch/issues/8480

https://github.com/pytorch/pytorch/pull/7392

https://github.com/pytorch/pytorch/issues/7071

https://github.com/pytorch/pytorch/issues/9871

ProGamerGov commented 4 years ago

I got CPU support working now, so that GPUs and the CPU can be used as devices together. An interesting thing you can do with the code, put a single layer on the CPU while the rest of the model is on a single GPU:

python3 neural_style.py -gpu c,0,c -image_size 256 -multidevice_strategy 1,34

The CPU is a lot slower than the GPU, but there are use cases where you need to offload some usage from your GPU(s).

ProGamerGov commented 4 years ago

I have resolved the memory issue with GPU:0

ProGamerGov / neural-style-pt

Added support for Multi-GPU and CPU #20