ProGamerGov / neural-style-pt

PyTorch implementation of neural style transfer algorithm
MIT License
833 stars 178 forks source link

FileNotFound error, as well as a few other errors. #74

Open Subash-Chandra opened 4 years ago

Subash-Chandra commented 4 years ago

image

I'm getting these errors no matter the picture combo I used.

I tried lowering the resolutions of the pictures in the script because I thought it was failing to compute, and therefore failing to save which caused the next function to error, but lowering the resolution didn't fix it.

I'm running on CUDA with CUDnn, and I'm running it on an i7-7700k + RTX 2080 Super. I've run higher res non-script style transfers that haven't failed though so I'm not too sure what the problem may be.

I thought it may be because of edits I made to the starry_stanford.sh script, but I redownloaded and ran with default parameters, and it still failed with the exact same errors.

ProGamerGov commented 4 years ago

The FileNotFound errors look like errors caused by another error that occurs earlier. The first step in the starry_stanford.sh script takes your input images, and produces an output image. The resulting output image is then used an input image for the next step, and the output of that step is used as an input for the step after that. If an earlier step fails, then no output image is produced.

What was the first error?

Subash-Chandra commented 4 years ago

The error is the first line. It's as follows

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED (createCuDNNHandle at ..\aten\src\ATen\cudnn\Handle.cpp:9)

Here is the Traceback just before that error.

Traceback (most recent call last):
  File "neural_style.py", line 468, in <module>
    main()
  File "neural_style.py", line 262, in main
    optimizer.step(feval)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\optim\lbfgs.py", line 311, in step
    orig_loss = closure()
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "neural_style.py", line 253, in feval
    loss.backward()
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED (createCuDNNHandle at ..\aten\src\ATen\cudnn\Handle.cpp:9)
(no backtrace available)
Subash-Chandra commented 4 years ago

I thought it was because I have the wrong version of cuDNN, so I reinstalled, and it was still errorring.

For reference, I am using CUDA v11.

ProGamerGov commented 4 years ago

@Subash-Chandra The PyTorch site shows that the PyTorch Conda install only supports CUDA 9.2, CUDA 10.1, and CUDA 10.2: https://pytorch.org/get-started/locally/

Also, unless you are installing from source, I think cuDNN is prepackaged (comes with the pip and Conda packages).

Subash-Chandra commented 4 years ago

I reinstalled CUDA 10.2, cuDNN 7.6.5 for CUDA 10.2, and the correct pyTorch version as well. Still getting the same error.

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when callingcublasCreate(handle)``

I'm not sure why it's able to compute the first 15 or so images, and only fail after that.

ProGamerGov commented 4 years ago

I'm not sure why it's able to compute the first 15 or so images, and only fail after that.

Can you elaborate on that?

Subash-Chandra commented 4 years ago

image

It makes 15 images no matter the input image combination.

I have 8 GB Vram on my 2080 Super and cuDNN is installed, so it should be able to handle resolution way above the 2350 from Step 5.

It finishes doing the 1000 iterations of Step 1, and 500 iterations of Step 2, and then just stops creating any more files.

edit - Also, there is no difference in the script itself between Step 2 and Step 3 except for the resolution, so there shouldn't be any reason that the script itself fails at that spot every time.

ProGamerGov commented 4 years ago

@Subash-Chandra I can't seem to figure if the error is because of a lack of memory or something else. I originally created starry_stanford.sh with a GPU with 12 GB of VRAM, and that's all I tested it with.

DataCrusade1999 commented 3 years ago

I reinstalled CUDA 10.2, cuDNN 7.6.5 for CUDA 10.2, and the correct pyTorch version as well. Still getting the same error.

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when callingcublasCreate(handle)``

I'm not sure why it's able to compute the first 15 or so images, and only fail after that.

I was able to resolve this issue by not including the -cudnn_autotune flag maybe this can help you out