FluxML / model-zoo

Please do not feed the models
https://fluxml.ai/
Other
916 stars 335 forks source link

re-establishing GPU support for char-rnn.jl #331

Open mchristianl opened 3 years ago

mchristianl commented 3 years ago

Hi,

I have just stumbled upon the char-rnn.jl example in my first attempt of checking out Julia machine learning with Flux.jl.

The example works, but produces kind of garbage as a text generator. Since it seems to be on the TODO list anyways in issue 266, I was wondering how difficult It'd be to re-establish the CUDA support that was removed previously in a commit. This would allow (me) to play around with a more intense training of the model to see whether the generated text gains some quality.

I guess the error that @AdarshKumar712 got was

ERROR: LoadError:
Need an adjoint for constructor Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}.
Gradient is of type CuArray{Float32,2}

in line

function loss(xs, ys)
  l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))
  return l
end

Any thoughts? Would this be fixable in an easy way?

regards, Christian

DhairyaLGandhi commented 3 years ago

Can we re run with the latest flux? Let's prioritise this

mchristianl commented 3 years ago

Yes: this particular error seems to be fixed in 0.12.0-dev and I get another error

FATAL ERROR: Symbol "__nv_sqrt"not found

which might indicate that my attempt to restore the file to it's previous version failed:

--- char-rnn.jl 2021-01-28 12:27:16.238144639 +0100
+++ char-rnn-new.jl 2021-01-28 12:25:24.361953681 +0100
@@ -3,6 +3,7 @@
 using StatsBase: wsample
 using Base.Iterators: partition
 using Parameters: @with_kw
+using CUDA

 # Hyperparameter arguments
 @with_kw mutable struct Args
@@ -51,15 +52,16 @@

     # Constructing Model
     m = build_model(N)
+    m = gpu(m)

     function loss(xs, ys)
-      l = sum(logitcrossentropy.(m.(xs), ys))
+      l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))
       return l
     end

     ## Training
     opt = ADAM(args.lr)
-    tx, ty = (Xs[5], Ys[5])
+    tx, ty = (gpu.(Xs[5]), gpu.(Ys[5]))
     evalcb = () -> @show loss(tx, ty)

     Flux.train!(loss, params(m), zip(Xs, Ys), opt, cb = throttle(evalcb, args.throttle))
@@ -84,5 +86,6 @@
 end

 cd(@__DIR__)
-m, alphabet = train()
+m, alphabet = train() # FATAL ERROR: Symbol "__nv_sqrt"not found
 sample(m, alphabet, 1000) |> println
DhairyaLGandhi commented 3 years ago

Which version of cuda are you using?

mchristianl commented 3 years ago

the Julia Package is at 2.4.1 and CUDA.versioninfo() gives me

CUDA toolkit 11.1.1, artifact installation
CUDA driver 11.1.0
NVIDIA driver 456.71.0

Libraries:
- CUBLAS: 11.3.0
- CURAND: 10.2.2
- CUFFT: 10.3.0
- CUSOLVER: 11.0.1
- CUSPARSE: 11.3.0
- CUPTI: 14.0.0
- NVML: 10.0.0+430.86
- CUDNN: 8.0.4 (for CUDA 11.1.0)
- CUTENSOR: 1.2.1 (for CUDA 11.1.0)

Toolchain:
- Julia: 1.5.2
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device:
  0: Quadro GP100 (sm_60, 14.738 GiB / 16.000 GiB available)
CarloLucibello commented 3 years ago

Probably no going to fix the problem, but

l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))

should be changed to

using Flux.Losses
l = logitcrossentropy(m.(gpu.(xs)), gpu.(ys), agg=sum)
DhairyaLGandhi commented 3 years ago

We've discussed this before, and the first approach is a lot more intuitive and clear about it's intention.

CarloLucibello commented 3 years ago

We've discussed this before, and the first approach is a lot more intuitive and clear about it's intention.

Yes we have discussed this before and as a result we have the whole Losses module based on the second approach and the first approach is deprecated