Sentdex / GANTheftAuto

Other
850 stars 71 forks source link

Did you forgot to mention the requirement of a nVidia GPU? #4

Closed LtqxWYEG closed 3 years ago

LtqxWYEG commented 3 years ago

Because I'm stuck with Could not find module 'caffe2_nvrtc.dll' (or one of its dependencies). And that sounds to me, after some googlin, that I'd need a CUDA capable GPU.

daniel-kukiela commented 3 years ago

Yes, some Nvidia GPU is required to run this. We did not even try to run these big models on a CPU, but even if possible, would not be a great experience.

davidgfb commented 3 years ago

@LtqxWYEG Just comment that workaround section in inference.py lines 22-25 with apostrophes so it looks like this: ''' # Workaround for PyTorch issue on Windows if os.name == 'nt': import ctypes ctypes.cdll.LoadLibrary('caffe2_nvrtc.dll') '''

daniel-kukiela commented 3 years ago

This is not only about these lines of code. It has never been tested on a CPU, so currently Nvidia GPU is required. We might check if it can be run on a CPU and if the results will be good enough, we'll update the code accordingly.

Sentdex commented 3 years ago

Neither of us have access to an AMD GPU to test, but I think Torch has AMD support now and it might work there too.

on CPU only, you'd likely struggle with FPS and it'd be unpleasant I'd imagine, but as Daniel said above we can give it a shot and see if there's a nice way to let people run this on CPU-only.

This is a project of 2 people, so our ability to test things across a wide range of devices and OSes is not really possible.

daniel-kukiela commented 3 years ago

In addition to what Harrison just said - if you have an AMD GPU set for ML learning and you want to test it out (no modification to the code should be required for this), you're more than welcome to do so! :)

LtqxWYEG commented 3 years ago

In addition to what Harrison just said - if you have an AMD GPU set for ML learning and you want to test it out (no modification to the code should be required for this), you're more than welcome to do so! :)

I'm not sure what you mean with "set for ML", but yes I'd like to try things. I commented the section and now I get the error:

  File "C:\Users\redacted\AppData\Local\Programs\Python\Python39\lib\site-packages\torch-1.8.0-py3.9-win-amd64.egg\torch\cuda\__init__.py", line 261, in set_device
    torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

Of course because there is no cuda device.

davidgfb commented 3 years ago

The next step would be to comment line 91 from the same file with a number sign like this: #torch.cuda.set_device(gpu)

LtqxWYEG commented 3 years ago

The next step would be to comment line 91 from the same file with a number sign like this: #torch.cuda.set_device(gpu)

Oh. I didn't think it would be this easy :D

Ok, but this one is not that simple, right?

File "C:\Users\Distelzombie\AppData\Local\Programs\Python\Python39\lib\site-packages\torch-1.8.0-py3.9-win-amd64.egg\torch\cuda\__init__.py", line 164, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

afaiR, I did download some cpu version of either torch or something similiar sounding. Maybe it DOES have to be the CUDA version, even if it does not sound like it would work?

Sentdex commented 3 years ago

Again as a slight warning, neither I nor Daniel are familiar with AMD GPUs for deep learning. All I know is that recently TF and Torch did officially add AMD support. This is fairly new, so when you search web engines, you might see people saying AMD GPUs aren't supported, since those articles are outdated.

AMD's "CUDA" equivalent is ROCm

So, when you go to download Torch, for example on this page: https://pytorch.org/get-started/locally/

You would select for ROCm (IF your GPU is AMD).

BEYOND that, however, I am not sure what else is required to actually set up DL on an AMD GPU, or if all AMD GPUs are supported (like with CUDA you need a CUDA capable device, which isn't all of the NVIDIA GPUs).

Also note the (beta) stipulation they're making. Who knows what further struggles await, but I encourage you to try and report back to us your findings, should you be so brave :P

LtqxWYEG commented 3 years ago

So, when you go to download Torch, for example on this page: https://pytorch.org/get-started/locally/

"NOTE: ROCm is not available on Windows"

:( Maybe I should try it over Jupyter? But I never used that

rgkimball commented 3 years ago

I have a nVidia GPU and was running into the same error message regarding the missing caffe2 DLL using the provided installation instructions. It worked for me when I installed the correct version of PyTorch:

pip3 install torch==1.9.0+cu102 torchvision==0.10.0+cu102 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

System Specs:

The PR above addresses this installation discrepancy to help prevent future confusion.

daniel-kukiela commented 3 years ago

@davidgfb Can I ask you to stop giving wrong advices? I already explained this to you, thank you.

daniel-kukiela commented 3 years ago

The next step would be to comment line 91 from the same file with a number sign like this: #torch.cuda.set_device(gpu)

Oh. I didn't think it would be this easy :D

Because it's not, you cannot just comment this out, it would not magically start using your AMD GPU if you do this.

Ok, but this one is not that simple, right?

File "C:\Users\Distelzombie\AppData\Local\Programs\Python\Python39\lib\site-packages\torch-1.8.0-py3.9-win-amd64.egg\torch\cuda\__init__.py", line 164, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

It looks like then that GameGAN is made specifically to use CUDA features and to made it ROCm compatible we'll need someone with an AMD GPU and knowledge. We do not own AMD GPUs to make it work.

afaiR, I did download some cpu version of either torch or something similiar sounding. Maybe it DOES have to be the CUDA version, even if it does not sound like it would work?

The error message suggests that original GameGAN devs put this check and error message there for a purpose. Maybe it could be changed, but we don't know for now.

LtqxWYEG commented 3 years ago

Oh. Well that does make sense that commenting it out wouldn't make it start using my AMD GPU, but I expected it to use the CPU. Anyway, I don't have the knowledge to make it work. I remember that I tried to get some other GAN to work on my PC, (They also said it should work on most or almost all PCs) and that would've only worked via Jupyter, maybe. I just used a Google Colab notebook with a nVidia GPU for that GAN, but that wouldn't work here, I guess, since it isn't just a simple script-like program

Then I give up :) Please consider rephrasing "It should work on almost every PC" with the next project/video you're doing. You deep learning guys always seem to forget about AMD users. Why is that? :)

daniel-kukiela commented 3 years ago

We've heard of the broken CUDA installs which probably fell back to a CPU and were like a frame every few seconds.

AMD has never been a thing with machine learning, that's why. There have been unofficial attempts, but even now PyTorch has a beta AMD GPU support (go here https://pytorch.org/ and see how it says beta next to ROCm). This is a very recent thing. And it's nothing about us, ml guys, it's about AMD to join the party :)

LtqxWYEG commented 3 years ago

Well it would probably be easier for AMD if nVidia wouldn't make everything closed source, proprietary software. Also CUDA-powered GPUs also support programming frameworks such as OpenMP, OpenACC and OpenCL. AMDs GPUs do support OpenCL. So its not just AMDs fault that nobody wants to use their hardware.

AMD makes their software open source, but oh fancy nVidia is too earnings-oriented to be fair :(

daniel-kukiela commented 3 years ago

First, this is not a place to talk about closed-source and open-source drivers, nor about Nvidia vs AMD. Not everything from AMD is open-sourced. What you are referring to is probably drivers and how that affects Linux (or lack of ability to modify them). But like I said, not a topic to be discussed here.

Second, Nvidia has nothing to do with this situation. ML frameworks are open-sourced, it's a lack of API in AMD drivers and/or hardware support in their cards so these ML frameworks can use them. It changes just lately, but there's still a lot of work that has to be done. AMD won't use Nvidia's drivers, will it? As for this project - it requires CUDA, but it might be mostly as a check if a GPU and drivers are present. Since there's no alternative, this is how it works. It's also open-sourced. PyTorch has beta support for ROCm, so if anyone with capable AMD GPU and knowledge wants to modify this project to work on AMD GPUs, they can. Nothing here is closed-sourced.

LtqxWYEG commented 3 years ago

Nvidia has nothing to do with this situation

eeh

Nothing here is closed-sourced.

Of course. I meant the CUDA technology that AMD could adopt if nVidia would let them. (Much like Physx, Hairworks, G-sync etc etc CUDA is prop. nVidia tech that only works with their GPUs - contrary to that AMDs FreeSync, FidelityFX etc pp is all open source that works with nVidia GPUs as well)

Anyway, yes this isn't the place to discuss this. I just wanted to vent my frustration with nVidia :)

Thank you! Kind regards :)

daniel-kukiela commented 3 years ago

Just to make things clear speaking of CUDA - it's closely tied with CUDA cores, which AMD does not have and would not have, and drivers, which AMD also cannot use, Nvidia and AMD cards are not using the same architecture as, for example, AMD and Intel CPUs, which both are based on x86_64. Also, I'm not trying to be in opposition to you, or just "on the other side", just sharing facts. Cheers. :)

LtqxWYEG commented 3 years ago

Well, you know. If nVidia would let them, other GPU manufacturers could include CUDA cores much like every card has SPUs, FPUs, ROPs, TMUs, Emus, Mops und blops That's why, currently (shrug), AMD is more consumer friendly and less of a capitalist hell-lord. Also have you seen AMDs driver GUI? It's like nVidia is still in the 90s when it comes to UI/UX/mindest :P

Also, let me tell you about Microsoft! ... xD