Open mchristianl opened 3 years ago
Can we re run with the latest flux? Let's prioritise this
Yes: this particular error seems to be fixed in 0.12.0-dev
and I get another error
FATAL ERROR: Symbol "__nv_sqrt"not found
which might indicate that my attempt to restore the file to it's previous version failed:
--- char-rnn.jl 2021-01-28 12:27:16.238144639 +0100
+++ char-rnn-new.jl 2021-01-28 12:25:24.361953681 +0100
@@ -3,6 +3,7 @@
using StatsBase: wsample
using Base.Iterators: partition
using Parameters: @with_kw
+using CUDA
# Hyperparameter arguments
@with_kw mutable struct Args
@@ -51,15 +52,16 @@
# Constructing Model
m = build_model(N)
+ m = gpu(m)
function loss(xs, ys)
- l = sum(logitcrossentropy.(m.(xs), ys))
+ l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))
return l
end
## Training
opt = ADAM(args.lr)
- tx, ty = (Xs[5], Ys[5])
+ tx, ty = (gpu.(Xs[5]), gpu.(Ys[5]))
evalcb = () -> @show loss(tx, ty)
Flux.train!(loss, params(m), zip(Xs, Ys), opt, cb = throttle(evalcb, args.throttle))
@@ -84,5 +86,6 @@
end
cd(@__DIR__)
-m, alphabet = train()
+m, alphabet = train() # FATAL ERROR: Symbol "__nv_sqrt"not found
sample(m, alphabet, 1000) |> println
Which version of cuda are you using?
the Julia Package is at 2.4.1
and CUDA.versioninfo()
gives me
CUDA toolkit 11.1.1, artifact installation
CUDA driver 11.1.0
NVIDIA driver 456.71.0
Libraries:
- CUBLAS: 11.3.0
- CURAND: 10.2.2
- CUFFT: 10.3.0
- CUSOLVER: 11.0.1
- CUSPARSE: 11.3.0
- CUPTI: 14.0.0
- NVML: 10.0.0+430.86
- CUDNN: 8.0.4 (for CUDA 11.1.0)
- CUTENSOR: 1.2.1 (for CUDA 11.1.0)
Toolchain:
- Julia: 1.5.2
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
1 device:
0: Quadro GP100 (sm_60, 14.738 GiB / 16.000 GiB available)
Probably no going to fix the problem, but
l = sum(logitcrossentropy.(m.(gpu.(xs)), gpu.(ys)))
should be changed to
using Flux.Losses
l = logitcrossentropy(m.(gpu.(xs)), gpu.(ys), agg=sum)
We've discussed this before, and the first approach is a lot more intuitive and clear about it's intention.
We've discussed this before, and the first approach is a lot more intuitive and clear about it's intention.
Yes we have discussed this before and as a result we have the whole Losses module based on the second approach and the first approach is deprecated
Hi,
I have just stumbled upon the char-rnn.jl example in my first attempt of checking out Julia machine learning with Flux.jl.
The example works, but produces kind of garbage as a text generator. Since it seems to be on the TODO list anyways in issue 266, I was wondering how difficult It'd be to re-establish the CUDA support that was removed previously in a commit. This would allow (me) to play around with a more intense training of the model to see whether the generated text gains some quality.
I guess the error that @AdarshKumar712 got was
in line
Any thoughts? Would this be fixable in an easy way?
regards, Christian