If I add using CuArrays to the top of the script, I get the following error:
$ julia vision/cifar10/cifar10.jl
ERROR: LoadError: GPU compilation of #23(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Bool,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Int64,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}) failed
KernelError: passing and using non-bitstype argument
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Int64,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel.
Stacktrace:
[1] check_invocation(::CUDAnative.CompilerContext, ::LLVM.Function) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/validation.jl:35
[2] compile(::CUDAnative.CompilerContext) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:94
[3] #compile#109(::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::VersionNumber, ::Any, ::Any) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:45
[4] compile at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:43 [inlined]
[5] #compile#108(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CUDAdrv.CuDevice, ::Function, ::Any) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:18
[6] compile at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:16 [inlined]
[7] macro expansion at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:269 [inlined]
[8] #cufunction#123(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::getfield(GPUArrays, Symbol("##23#24")), ::Type{Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{Bool,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Int64,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}}}) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:240
[9] cufunction(::Function, ::Type) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:240
[10] macro expansion at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:208 [inlined]
[11] macro expansion at ./gcutils.jl:87 [inlined]
[12] macro expansion at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:205 [inlined]
[13] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Bool,1}, ::Tuple{CuArray{Bool,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CuArray{Int64,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/tyler/.julia/packages/CuArrays/qZCAt/src/gpuarray_interface.jl:59
[14] gpu_call(::Function, ::CuArray{Bool,1}, ::Tuple{CuArray{Bool,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CuArray{Int64,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}}, ::Int64) at /home/tyler/.julia/packages/GPUArrays/t8tJB/src/abstract_gpu_interface.jl:151
[15] gpu_call at /home/tyler/.julia/packages/GPUArrays/t8tJB/src/abstract_gpu_interface.jl:128 [inlined]
[16] copyto! at /home/tyler/.julia/packages/GPUArrays/t8tJB/src/broadcast.jl:48 [inlined]
[17] copyto! at ./broadcast.jl:797 [inlined]
[18] copy(::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Tuple{Base.OneTo{Int64}},typeof(==),Tuple{CuArray{Int64,1},Array{Int64,1}}}) at ./broadcast.jl:773
[19] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(==),Tuple{CuArray{Int64,1},Array{Int64,1}}}) at ./broadcast.jl:753
[20] accuracy(::CuArray{Float32,4}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/tyler/code/model-zoo/vision/cifar10/cifar10.jl:114
[21] macro expansion at ./show.jl:555 [inlined]
[22] (::getfield(Main, Symbol("##33#34")))() at /home/tyler/code/model-zoo/vision/cifar10/cifar10.jl:118
[23] (::getfield(Flux, Symbol("##throttled#10#14")){Bool,Bool,getfield(Main, Symbol("##33#34")),Int64})(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function) at /home/tyler/.julia/packages/Flux/lz7S9/src/utils.jl:120
[24] throttled at /home/tyler/.julia/packages/Flux/lz7S9/src/utils.jl:116 [inlined]
[25] macro expansion at /home/tyler/.julia/packages/Flux/lz7S9/src/optimise/train.jl:75 [inlined]
[26] macro expansion at /home/tyler/.julia/packages/Juno/TfNYn/src/progress.jl:133 [inlined]
[27] #train!#12(::getfield(Flux, Symbol("#throttled#18")){getfield(Flux, Symbol("##throttled#10#14")){Bool,Bool,getfield(Main, Symbol("##33#34")),Int64}}, ::Function, ::Function, ::Tracker.Params, ::Array{Tuple{CuArray{Float32,4},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}},1}, ::ADAM) at /home/tyler/.julia/packages/Flux/lz7S9/src/optimise/train.jl:69
[28] (::getfield(Flux.Optimise, Symbol("#kw##train!")))(::NamedTuple{(:cb,),Tuple{getfield(Flux, Symbol("#throttled#18")){getfield(Flux, Symbol("##throttled#10#14")){Bool,Bool,getfield(Main, Symbol("##33#34")),Int64}}}}, ::typeof(Flux.Optimise.train!), ::Function, ::Tracker.Params, ::Array{Tuple{CuArray{Float32,4},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}},1}, ::ADAM) at ./none:0
[29] top-level scope at none:0
[30] include at ./boot.jl:326 [inlined]
[31] include_relative(::Module, ::String) at ./loading.jl:1038
[32] include(::Module, ::String) at ./sysimg.jl:29
[33] exec_options(::Base.JLOptions) at ./client.jl:267
[34] _start() at ./client.jl:436
in expression starting at /home/tyler/code/model-zoo/vision/cifar10/cifar10.jl:124
Edit: hm, same problem on vision/mnist/conv.jl. The code does hit the GPU for a couple seconds however according to watch nvidia-smi
$ julia vision/mnist/conv.jl
[ Info: activating new environment at ~/code/model-zoo/cuda.
Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General.git`
Resolving package versions...
[ Info: Loading data set
[ Info: Constructing model...
[ Info: Beginning training loop...
ERROR: LoadError: GPU compilation of #23(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Bool,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Int64,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}) failed
KernelError: passing and using non-bitstype argument
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Int64,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel.
Stacktrace:
[1] check_invocation(::CUDAnative.CompilerContext, ::LLVM.Function) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/validation.jl:35
[2] compile(::CUDAnative.CompilerContext) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:94
[3] #compile#109(::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::VersionNumber, ::Any, ::Any) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:45
[4] compile at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:43 [inlined]
[5] #compile#108(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CUDAdrv.CuDevice, ::Function, ::Any) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:18
[6] compile at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:16 [inlined]
[7] macro expansion at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:269 [inlined]
[8] #cufunction#123(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::getfield(GPUArrays, Symbol("##23#24")), ::Type{Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{Bool,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Int64,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}}}) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:240
[9] cufunction(::Function, ::Type) at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:240
[10] macro expansion at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:208 [inlined]
[11] macro expansion at ./gcutils.jl:87 [inlined]
[12] macro expansion at /home/tyler/.julia/packages/CUDAnative/PFgO3/src/execution.jl:205 [inlined]
[13] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Bool,1}, ::Tuple{CuArray{Bool,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CuArray{Int64,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/tyler/.julia/packages/CuArrays/qZCAt/src/gpuarray_interface.jl:59
[14] gpu_call(::Function, ::CuArray{Bool,1}, ::Tuple{CuArray{Bool,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{CuArray{Int64,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}}, ::Int64) at /home/tyler/.julia/packages/GPUArrays/t8tJB/src/abstract_gpu_interface.jl:151
[15] gpu_call at /home/tyler/.julia/packages/GPUArrays/t8tJB/src/abstract_gpu_interface.jl:128 [inlined]
[16] copyto! at /home/tyler/.julia/packages/GPUArrays/t8tJB/src/broadcast.jl:48 [inlined]
[17] copyto! at ./broadcast.jl:797 [inlined]
[18] copy(::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Tuple{Base.OneTo{Int64}},typeof(==),Tuple{CuArray{Int64,1},Array{Int64,1}}}) at ./broadcast.jl:773
[19] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(==),Tuple{CuArray{Int64,1},Array{Int64,1}}}) at ./broadcast.jl:753
[20] accuracy(::CuArray{Float32,4}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/tyler/code/model-zoo/vision/mnist/conv.jl:85
[21] top-level scope at /home/tyler/code/model-zoo/vision/mnist/conv.jl:100 [inlined]
[22] top-level scope at ./none:0
[23] include at ./boot.jl:326 [inlined]
[24] include_relative(::Module, ::String) at ./loading.jl:1038
[25] include(::Module, ::String) at ./sysimg.jl:29
[26] exec_options(::Base.JLOptions) at ./client.jl:267
[27] _start() at ./client.jl:436
in expression starting at /home/tyler/code/model-zoo/vision/mnist/conv.jl:94
I'm not sure if this is a problem on ModelZoo or elsewhere. I'm on CUDA 10.0 and CUDNN 7.3.1. Here's the tests from CuArrays:
If I add using CuArrays to the top of the script, I get the following error:
Edit: hm, same problem on vision/mnist/conv.jl. The code does hit the GPU for a couple seconds however according to
watch nvidia-smi
I'm not sure if this is a problem on ModelZoo or elsewhere. I'm on CUDA 10.0 and CUDNN 7.3.1. Here's the tests from CuArrays:
And here's the test results for Flux:
This appears to be a known issue https://github.com/FluxML/Flux.jl/issues/267