Open gateway opened 4 years ago
First you should check if PyTorch sees your devices correctly and that CUDA works. Try running this in the Python interpreter and seeing what it shows:
import torch
torch.__version__ # Get PyTorch and CUDA version
torch.cuda.is_available() # Check that CUDA works
torch.cuda.device_count() # Check how many CUDA capable devices you have
# Print device human readable names
torch.cuda.get_device_name(0)
torch.cuda.get_device_name(1)
# Add more lines with +1 like get_device_name(3), get_device_name(4) if you have more devices.
If the devices exist and CUDA works, then it's probably just an issue with the ID you are using. CUDA can sometimes be a bit weird with how it sets GPU IDs: https://stackoverflow.com/questions/13781738/how-does-cuda-assign-device-ids-to-gpus
You fix the GPU device order by CUDA_DEVICE_ORDER=PCI_BUS_ID
before the command:
CUDA_DEVICE_ORDER=PCI_BUS_ID python3 neural_style.py
You can also use CUDA_VISIBLE_DEVICES
before the command to make sure that PyTorch can only see the specified device:
# Only make GPU ID 1 visible to PyTorch
CUDA_VISIBLE_DEVICES=1 python3 neural_style.py
ahh.. never knew that about PyTorch, it seems that the device id's compared to what nvidia-smi are swapped.
>>> torch.cuda.get_device_name(0)
'GeForce GTX 1080'
>>> torch.cuda.get_device_name(1)
'GeForce GTX 1060 6GB'
hmm so in my case adding maybe this. CUDA_DEVICE_ORDER=0 python3 neural_style.py would be the 1060, and CUDA_DEVICE_ORDER=1 python3 neural_style.py should be the 1080?
should I make any changed to the GPU value in the script? Thanks for your timely response.. btw has anyone used your version of style transfer for video?
The invalid device ordinal is error is normally given when you specify a non existent GPU ID.
The GPU value in the script should be set to the PyTorch GPU ID that you want to use as PyTorch shows the device you want to use as having an ID of 0
. The order and GPU values available to PyTorch will change based on the CUDA environment variables you specify.
CUDA_DEVICE_ORDER=PCI_BUS_ID
will swap the GPU order if the existing order is not based on the PCI Bus order.
CUDA_VISIBLE_DEVICES=1
will make GPU 0 in PyTorch be your second GPU.
CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=1
will only give PyTorch the second GPU device based on the PCI Bus order, but that second GPU will listed as GPU 0 so you'll need to use -gpu 0
.
btw has anyone used your version of style transfer for video?
Yes, but those individuals tend to use techniques like rotoscoping to create video to avoid the flicking effect. I'm not knowledgeable enough yet to translate artistic-videos to PyTorch. But it should easier for someone who better understands the video aspect of the code, as both artistic-videos and neural-style-pt are based on the same original code (neural-style).
Basically this is what neural-style-pt does with GPU IDs (example with the Python Interpreter):
import torch
a = torch.randn(3)
a.to('cpu') # Puts tensor 'a' on the CPU if it wasn't already
a.to('cuda:0') # Puts tensor 'a' on device 0
a.to('cuda:1') # Puts tensor 'a' on device 1
When I specify a valid GPU, I get something like this:
>>> a.to('cuda:0')
tensor([ 0.8459, -0.2027, 0.6153], device='cuda:0')
And when I specify a GPU that doesn't exist on my computer, I get this:
>>> a.to('cuda:1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: invalid device ordinal
Hi, I'm trying to run the script above to see if my system can handle and create larger images based upon your script.
I added -optimizer adam and using the NIN model for lower memory gpus.
Here is my output that fails eventually...
https://github.com/ProGamerGov/neural-style-pt/blob/master/examples/scripts/starry_stanford.sh
Nvidia info
btw I'm using GPU 1 since it has the most memory and not using the primary display..
thoughts?