ProGamerGov / neural-style-pt

PyTorch implementation of neural style transfer algorithm
MIT License
832 stars 178 forks source link

Understanding Multidevice Strategy #81

Open gateway opened 3 years ago

gateway commented 3 years ago

I have been trying to figure out how to max out both of my gpu's that are in my system.

Tue Oct 13 15:15:00 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN RTX           Off  | 00000000:01:00.0 Off |                  N/A |
| 41%   41C    P8    15W / 280W |    292MiB / 24220MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:02:00.0 Off |                  N/A |
| 21%   50C    P8     6W / 180W |      2MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2061      G   /usr/lib/xorg/Xorg                191MiB |
|    0   N/A  N/A      2745      G   ...mviewer/tv_bin/TeamViewer       13MiB |
|    0   N/A  N/A      2949      G   /usr/bin/gnome-shell               83MiB |
+-----------------------------------------------------------------------------+

GPU 0 has the most memory,

I'm trying to understand the -multidevice_strategy, how many layers are their.. its not very clear to me what would be the best for 2 gpus, one with more memory than the other.. or at least a starting off point..

I have just tried the value of 20 and this was the result..

image

ProGamerGov commented 3 years ago

@gateway The multidevice strategy just simply splits the model layers across different devices. The layer order in a model dictates which layers go on which device, based on your specified -multidevice_strategy params. There's not really a special formula for what values to use, and you'll probably have to experiment a bit to find the best possible settings.

Edit:

This may help: https://www.reddit.com/r/deepdream/comments/dnsg65/multigpu_strategy/

IridiumMaster commented 3 years ago

Was trying this on two Google A100s in their cloud. Devices list below. Used the following parameters: neural-style -multidevice_strategy 3,7,12 -gpu 0,1 -style_image myPainting63.jpg -content_image Headcrop4.jpg -model_file vgg19-d01eb7cb.pth -image_size 3000 -backend cudnn -optimizer lbfgs -num_iterations 2500 -output_image g63.png -original_colors 1 I get the error: "The number of -multidevice_strategy layer indices minus 1, must be equal to the number of -gpu devices."

It's not clear to me what I am doing wrong here. Could you please help? I ran some other code to validate that these CUDA devices could be detected by PyTorch.

`+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 A100-SXM4-40GB Off | 00000000:00:04.0 Off | 0 | | N/A 34C P0 56W / 350W | 0MiB / 40537MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 1 A100-SXM4-40GB Off | 00000000:00:05.0 Off | 0 | | N/A 32C P0 53W / 350W | 0MiB / 40537MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ `

ProGamerGov commented 3 years ago

@IridiumMaster The -multidevice_strategy parameter tells the code where to slice / cut the model, and in your case 2 GPUs means you want to have the model cut into 2 pieces (one for each GPU). So, for two GPUs you should only specify one value for -multidevice_strategy.

IridiumMaster commented 3 years ago

@IridiumMaster The -multidevice_strategy parameter tells the code where to slice / cut the model, and in your case 2 GPUs means you want to have the model cut into 2 pieces (one for each GPU). So, for two GPUs you should only specify one value for -multidevice_strategy.

Thanks kindly, that worked very well for me.

robertgoacher commented 3 years ago

@ProGamerGov In neural-style-pt/examples/scripts/starry_stanford_bigger.sh you aren't using the multiple GPU setting for the lower-resolution images. Is that because there is no benefit (in speed or memory) from splitting the layers over multiple GPUs at those lower resolutions? I'm just trying to get an understand of when multiple GPUs would be best used.

ProGamerGov commented 3 years ago

@RobertGoacher Using multiple GPUs can be a bit slower than using a single GPU. Also, there's a small increase in memory that results from using multiple GPUs as well I think.

robertgoacher commented 3 years ago

@ProGamerGov Thank you so much for your reply; I really appreciate it.

I think I understand this now...but please correct me if I'm wrong.

So you need to use the multiple GPU strategy for high-resolution style transfers because individual GPUs don't normally have enough memory to do the inference? If you have a GPU with lots of memory (for example a NVIDIA A100 GPU with 40GB of memory) you might be able to complete a render at a high resolution on that GPU without needing to use the multiple GPU strategy? But if you do need to use multiple GPUs you can split the processing (and therefore the memory usage) over multiple GPUs but there will be a decrease in speed and an increase in memory usage from using that strategy?