Open egolep opened 2 years ago
What version of CUDA.jl are you on and what CUDA toolkit? This looks like https://github.com/FluxML/Flux.jl/issues/2018.
Can we produce a MWE using only CUDA.jl?
Sorry, I wanted to leave CUDA and Flux versions but I forgot. Here they are:
CUDA v3.11.0
Flux v0.13.4
I also saw #2018 but the stack trace looked different so I though I stumbled on a different error or at least another corner case. If this is not the case, forgive me!
Ah, this looks like https://github.com/JuliaGPU/CUDA.jl/issues/1508. I believe Flux + CUDA should play well together, but if you're loading other packages, they may trigger the heuristic mentioned in that issue.
Whenever I try to train a model on GPU with a
Dropout
layer the training fails and I get the error message pasted below. At the beginning I thought it was a problem related to explicitly set a seed for random procedures (layer initialization, dataset splitting etc) but now I have completely removed any seed specification and the problem is still present. Then, I noticed that the problem emerge when using aDataLoader
during the training phase, I get stuck with the same error even when using something likerand(3, 10)
andrand(1, 10)
as the data and the label of the DataLoader respectively and with a simple model likem = Chain(Dense(3, 1), Dropout(0.2))
Is this a general problem withDropout
+DataLoader
on GPU?