Open Subash-Chandra opened 4 years ago
The FileNotFound errors look like errors caused by another error that occurs earlier. The first step in the starry_stanford.sh script takes your input images, and produces an output image. The resulting output image is then used an input image for the next step, and the output of that step is used as an input for the step after that. If an earlier step fails, then no output image is produced.
What was the first error?
The error is the first line. It's as follows
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED (createCuDNNHandle at ..\aten\src\ATen\cudnn\Handle.cpp:9)
Here is the Traceback just before that error.
Traceback (most recent call last):
File "neural_style.py", line 468, in <module>
main()
File "neural_style.py", line 262, in main
optimizer.step(feval)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\optim\lbfgs.py", line 311, in step
orig_loss = closure()
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "neural_style.py", line 253, in feval
loss.backward()
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\__init__.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED (createCuDNNHandle at ..\aten\src\ATen\cudnn\Handle.cpp:9)
(no backtrace available)
I thought it was because I have the wrong version of cuDNN, so I reinstalled, and it was still errorring.
For reference, I am using CUDA v11.
@Subash-Chandra The PyTorch site shows that the PyTorch Conda install only supports CUDA 9.2, CUDA 10.1, and CUDA 10.2: https://pytorch.org/get-started/locally/
Also, unless you are installing from source, I think cuDNN is prepackaged (comes with the pip and Conda packages).
I reinstalled CUDA 10.2, cuDNN 7.6.5 for CUDA 10.2, and the correct pyTorch version as well. Still getting the same error.
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling
cublasCreate(handle)``
I'm not sure why it's able to compute the first 15 or so images, and only fail after that.
I'm not sure why it's able to compute the first 15 or so images, and only fail after that.
Can you elaborate on that?
It makes 15 images no matter the input image combination.
I have 8 GB Vram on my 2080 Super and cuDNN is installed, so it should be able to handle resolution way above the 2350 from Step 5.
It finishes doing the 1000 iterations of Step 1, and 500 iterations of Step 2, and then just stops creating any more files.
edit - Also, there is no difference in the script itself between Step 2 and Step 3 except for the resolution, so there shouldn't be any reason that the script itself fails at that spot every time.
@Subash-Chandra I can't seem to figure if the error is because of a lack of memory or something else. I originally created starry_stanford.sh with a GPU with 12 GB of VRAM, and that's all I tested it with.
I reinstalled CUDA 10.2, cuDNN 7.6.5 for CUDA 10.2, and the correct pyTorch version as well. Still getting the same error.
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling
cublasCreate(handle)``I'm not sure why it's able to compute the first 15 or so images, and only fail after that.
I was able to resolve this issue by not including the -cudnn_autotune
flag maybe this can help you out
I'm getting these errors no matter the picture combo I used.
I tried lowering the resolutions of the pictures in the script because I thought it was failing to compute, and therefore failing to save which caused the next function to error, but lowering the resolution didn't fix it.
I'm running on CUDA with CUDnn, and I'm running it on an i7-7700k + RTX 2080 Super. I've run higher res non-script style transfers that haven't failed though so I'm not too sure what the problem may be.
I thought it may be because of edits I made to the starry_stanford.sh script, but I redownloaded and ran with default parameters, and it still failed with the exact same errors.