JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.21k stars 221 forks source link

ERROR_ILLEGAL_ADDRESS when broadcasting modular arithmetic #94

Closed aterenin closed 6 months ago

aterenin commented 4 years ago

Describe the bug Gradients for modular arithmetic trigger an illegal address error. No issues with any other type of broadcasting at present.

To Reproduce

Flux.gradient(x -> mod.(x, Float32(2*pi))|>sum, CuArrays.randn(10,10))

Expected behavior No error.

Build log

I ran ]build CuArrays but this only produced output for NNlib.

Environment details (please complete this section) Details on Julia:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-9800X CPU @ 3.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Julia packages:

The Zygote branch fixes broadcasting and is from this PR: https://github.com/FluxML/Zygote.jl/pull/565.

CUDA: toolkit and driver version:

$ nvidia-smi
Wed Apr  1 14:25:32 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:17:00.0 Off |                  N/A |
|  9%   54C    P8    31W / 215W |      1MiB /  7982MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 2080    Off  | 00000000:65:00.0 Off |                  N/A |
|  0%   48C    P8     9W / 215W |     31MiB /  7981MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      4620      G   /usr/lib/xorg/Xorg                            29MiB |
+-----------------------------------------------------------------------------+

Additional context

error in running finalizer: CUDAdrv.CuError(code=CUDAdrv.cudaError_enum(0x000002bc), meta=nothing)

ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(::CUDAdrv.cudaError_enum) at /vol/bitbucket/at6617/dot_julia/packages/CUDAdrv/YK1gX/src/error.jl:110
 [2] CUDAdrv.CuModule(::String, ::Dict{CUDAdrv.CUjit_option_enum,Any}) at /vol/bitbucket/at6617/dot_julia/packages/CUDAdrv/YK1gX/src/module.jl:42
 [3] #cufunction_slow#221(::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction_slow), ::Function, ::Type, ::Int64) at /vol/bitbucket/at6617/dot_julia/packages/CUDAnative/cnQli/src/execution.jl:356
 [4] #cufunction_slow at ./none:0 [inlined]
 [5] JuliaGPU/CuArrays.jl#223 at /vol/bitbucket/at6617/dot_julia/packages/CUDAnative/cnQli/src/execution.jl:393 [inlined]
 [6] get!(::CUDAnative.var"#223#224"{String,Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},GPUArrays.var"#28#29",DataType,Int64}, ::Dict{UInt64,CUDAnative.HostKernel}, ::UInt64) at ./dict.jl:452
 [7] macro expansion at /vol/bitbucket/at6617/dot_julia/packages/CUDAnative/cnQli/src/execution.jl:392 [inlined]
 [8] macro expansion at ./lock.jl:173 [inlined]
 [9] #cufunction_fast#222(::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction_fast), ::Function, ::Type, ::Int64) at /vol/bitbucket/at6617/dot_julia/packages/CUDAnative/cnQli/src/execution.jl:391
 [10] (::CUDAnative.var"#kw##cufunction_fast")(::NamedTuple{(:name,),Tuple{String}}, ::typeof(CUDAnative.cufunction_fast), ::Function, ::Type, ::Int64) at ./none:0
 [11] getproperty at ./Base.jl:20 [inlined]
 [12] merge at ./namedtuple.jl:247 [inlined]
 [13] #cufunction#225(::Base.Iterators.Pairs{Symbol,String,Tuple{Symbol},NamedTuple{(:name,),Tuple{String}}}, ::typeof(CUDAnative.cufunction), ::GPUArrays.var"#28#29", ::Type{Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float64,2,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1699#1703",Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float64,2},2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}}) at /vol/bitbucket/at6617/dot_julia/packages/CUDAnative/cnQli/src/execution.jl:0
 [14] (::CUDAnative.var"#kw##cufunction")(::NamedTuple{(:name,),Tuple{String}}, ::typeof(CUDAnative.cufunction), ::Function, ::Type) at ./none:0
 [15] macro expansion at /vol/bitbucket/at6617/dot_julia/packages/CUDAnative/cnQli/src/execution.jl:157 [inlined]
 [16] macro expansion at ./gcutils.jl:91 [inlined]
 [17] macro expansion at /vol/bitbucket/at6617/dot_julia/packages/CUDAnative/cnQli/src/execution.jl:154 [inlined]
 [18] #gpu_call#49(::String, ::typeof(GPUArrays.gpu_call), ::CuArrays.CuArrayBackend, ::Function, ::Tuple{CuArray{Float64,2,Nothing},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1699#1703",Tuple{Base.Broadcast.Extruded{CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}, ::Int64) at /vol/bitbucket/at6617/dot_julia/packages/CuArrays/mfRZ9/src/gpuarrays.jl:32
 [19] #gpu_call#1 at ./none:0 [inlined]
 [20] #gpu_call at ./none:0 [inlined]
 [21] copyto! at /vol/bitbucket/at6617/dot_julia/packages/GPUArrays/K1wPu/src/host/broadcast.jl:63 [inlined]
 [22] copyto! at ./broadcast.jl:863 [inlined]
 [23] copy(::Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{2},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1699#1703",Tuple{CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}}) at ./broadcast.jl:839
 [24] materialize at ./broadcast.jl:819 [inlined]
 [25] map at /vol/bitbucket/at6617/dot_julia/packages/GPUArrays/K1wPu/src/host/broadcast.jl:91 [inlined]
 [26] broadcast_forward at /vol/bitbucket/at6617/dot_julia/packages/Zygote/xyPOr/src/lib/broadcast.jl:175 [inlined]
 [27] adjoint at /vol/bitbucket/at6617/dot_julia/packages/Zygote/xyPOr/src/lib/broadcast.jl:189 [inlined]
 [28] _pullback at /vol/bitbucket/at6617/dot_julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
 [29] adjoint at /vol/bitbucket/at6617/dot_julia/packages/Zygote/xyPOr/src/lib/lib.jl:156 [inlined]
 [30] _pullback at /vol/bitbucket/at6617/dot_julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
 [31] broadcasted at ./broadcast.jl:1237 [inlined]
 [32] JuliaGPU/CuArrays.jl#7 at ./REPL[12]:1 [inlined]
 [33] _pullback(::Zygote.Context, ::var"#7#8", ::CuArray{Float32,2,CuArray{Float32,1,Nothing}}) at /vol/bitbucket/at6617/dot_julia/packages/Zygote/xyPOr/src/compiler/interface2.jl:0
 [34] _pullback(::Function, ::CuArray{Float32,2,CuArray{Float32,1,Nothing}}) at /vol/bitbucket/at6617/dot_julia/packages/Zygote/xyPOr/src/compiler/interface.jl:29
 [35] pullback(::Function, ::CuArray{Float32,2,CuArray{Float32,1,Nothing}}) at /vol/bitbucket/at6617/dot_julia/packages/Zygote/xyPOr/src/compiler/interface.jl:35
 [36] gradient(::Function, ::CuArray{Float32,2,CuArray{Float32,1,Nothing}}) at /vol/bitbucket/at6617/dot_julia/packages/Zygote/xyPOr/src/compiler/interface.jl:44
 [37] top-level scope at REPL[12]:1
aterenin commented 4 years ago

Some more info.

Adjoint code output below.


julia> Zygote.@code_adjoint (x -> mod.(x, 1.0f0)|>sum)(CuArrays.randn(10,10))
Zygote.Adjoint(1: (%3, %4 :: Zygote.Context, %1, %2)
  %5 = Zygote._pullback(%4, Base.broadcasted, Main.mod, %2, 1.0f0)
  %6 = Base.getindex(%5, 1)
  %7 = Base.getindex(%5, 2)
  %8 = Zygote._pullback(%4, Base.materialize, %6)
  %9 = Base.getindex(%8, 1)
  %10 = Base.getindex(%8, 2)
  %11 = Zygote._pullback(%4, Main.:|>, %9, Main.sum)
  %12 = Base.getindex(%11, 1)
  %13 = Base.getindex(%11, 2)
  return %12, 1: (%1)
  %2 = (@13)(%1)
  %3 = Zygote.gradindex(%2, 2)
  %4 = (@10)(%3)
  %5 = Zygote.gradindex(%4, 2)
  %6 = (@7)(%5)
  %7 = Zygote.gradindex(%6, 3)
  %8 = Zygote.tuple(nothing, %7)
  return %8)```
aterenin commented 4 years ago

Here's a much smaller MWE based on the above adjoint code.

v2 = CuArrays.randn(10,10)
v4 = Zygote.Context()
v5 = Zygote._pullback(v4, Base.broadcasted, Main.mod, v2, 1.0f0)
aterenin commented 4 years ago

Here's the code Zygote generates. It's not very easy for me to see what is actually going on here.

julia> @code_typed Zygote._pullback(v4, Base.broadcasted, Main.mod, v2, 1.0f0)
CodeInfo(
1 ─ %1  = Base.getfield(args, 2)::CuArray{Float32,2,CuArray{Float32,1,Nothing}}
│   %2  = Base.getfield(args, 3)::Float32
└──       goto JuliaGPU/CuArrays.jl#3
2 ─       $(Expr(:meta, :inline))
3 ┄       goto JuliaGPU/CuArrays.jl#5
4 ─       $(Expr(:meta, :inline))
5 ┄ %7  = Core.tuple(%1, %2)::Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32}
│   %8  = Core.tuple(%1, %2)::Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32}
│   %9  = Base.getfield(%1, :dims)::Tuple{Int64,Int64}
│   %10 = Base.getfield(%9, 1, true)::Int64
│   %11 = Base.slt_int(%10, 0)::Bool
│   %12 = Base.ifelse(%11, 0, %10)::Int64
│   %13 = %new(Base.OneTo{Int64}, %12)::Base.OneTo{Int64}
│   %14 = Base.getfield(%9, 2, true)::Int64
│   %15 = Base.slt_int(%14, 0)::Bool
│   %16 = Base.ifelse(%15, 0, %14)::Int64
│   %17 = %new(Base.OneTo{Int64}, %16)::Base.OneTo{Int64}
│   %18 = Core.tuple(%13, %17)::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}
│   %19 = %new(Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{2},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1692#1695"{typeof(mod)},Tuple{CuArray{Float32,2,CuArray{Float32,1,
Nothing}},Float32}}, Zygote.var"#1692#1695"{typeof(mod)}(mod), %8, %18)::Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{2},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1692#169
5"{typeof(mod)},Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32}}
│   %20 = invoke Base.Broadcast.copy(%19::Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{2},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1692#1695"{typeof(mod)},Tuple{CuArray{F
loat32,2,CuArray{Float32,1,Nothing}},Float32}})::CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}
│   %21 = Core.tuple(%20)::Tuple{CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}
│   %22 = Base.getfield(%20, :dims)::Tuple{Int64,Int64}
│   %23 = Base.getfield(%22, 1, true)::Int64
│   %24 = Base.slt_int(%23, 0)::Bool
│   %25 = Base.ifelse(%24, 0, %23)::Int64
│   %26 = %new(Base.OneTo{Int64}, %25)::Base.OneTo{Int64}
│   %27 = Base.getfield(%22, 2, true)::Int64
│   %28 = Base.slt_int(%27, 0)::Bool
│   %29 = Base.ifelse(%28, 0, %27)::Int64
│   %30 = %new(Base.OneTo{Int64}, %29)::Base.OneTo{Int64}
│   %31 = Core.tuple(%26, %30)::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}
│   %32 = %new(Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{2},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1699#1703",Tuple{CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Not
hing}}}, Zygote.var"#1699#1703"(), %21, %31)::Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{2},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1699#1703",Tuple{CuArray{ForwardDif
f.Dual{Nothing,Float64,2},2,Nothing}}}
│   %33 = invoke Base.Broadcast.copy(%32::Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{2},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1699#1703",Tuple{CuArray{ForwardDiff.Du
al{Nothing,Float64,2},2,Nothing}}})::CuArray{Float64,2,Nothing}
│   %34 = %new(Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}, %7, %20)::Zygote.var"#_back#170
4"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}
│   %35 = %new(Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}}, %34):
:Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}}
│   %36 = %new(Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64
,2},2,Nothing}}}}, %35)::Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothi
ng,Float64,2},2,Nothing}}}}
│   %37 = %new(Zygote.var"#72#back#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{Forwar
dDiff.Dual{Nothing,Float64,2},2,Nothing}}}}}, %36)::Zygote.var"#72#back#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float3
2,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}}}}
│   %38 = %new(Zygote.var"#173#174"{Zygote.var"#72#back#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Fl
oat32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}}}},Tuple{NTuple{4,Nothing},Tuple{}}}, %37, ((nothing, nothing, nothing, nothing), ()))::Zygote.var"#173#174"{Zygote.var"#72#b
ack#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2
,Nothing}}}}},Tuple{NTuple{4,Nothing},Tuple{}}}
│   %39 = %new(Zygote.var"#334#back#175"{Zygote.var"#173#174"{Zygote.var"#72#back#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuAr
ray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}}}},Tuple{NTuple{4,Nothing},Tuple{}}}}, %38)::Zygote.var"#334#back#175"{Zygote.var"#173#174"{Zygote.
var"#72#back#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Flo
at64,2},2,Nothing}}}}},Tuple{NTuple{4,Nothing},Tuple{}}}}
│   %40 = Base.tuple($(QuoteNode(∂(broadcastable))), Zygote.var"#334#back#175"{Zygote.var"#173#174"{Zygote.var"#1611#1613",Tuple{Tuple{Nothing,Nothing},Tuple{}}}}(Zygote.var"#173#174"{Zygo
te.var"#1611#1613",Tuple{Tuple{Nothing,Nothing},Tuple{}}}(Zygote.var"#1611#1613"(), ((nothing, nothing), ()))), Zygote.var"#237#back#127"{typeof(identity)}(identity), %39, Zygote.var"#237#
back#127"{typeof(identity)}(identity), $(QuoteNode(∂(broadcastable))), Zygote.var"#3039#back#1181"{Zygote.var"#1174#1178"}(Zygote.var"#1174#1178"()))::Core.Compiler.PartialStruct(Tuple{typ
eof(∂(broadcastable)),Zygote.var"#334#back#175"{Zygote.var"#173#174"{Zygote.var"#1611#1613",Tuple{Tuple{Nothing,Nothing},Tuple{}}}},Zygote.var"#237#back#127"{typeof(identity)},Zygote.var"#
334#back#175"{Zygote.var"#173#174"{Zygote.var"#72#back#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Flo
at32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}}}},Tuple{NTuple{4,Nothing},Tuple{}}}},Zygote.var"#237#back#127"{typeof(identity)},typeof(∂(broadcastable)),Zygote.var"#3039#ba
ck#1181"{Zygote.var"#1174#1178"}}, Any[Core.Compiler.Const(∂(broadcastable), false), Core.Compiler.Const(Zygote.var"#334#back#175"{Zygote.var"#173#174"{Zygote.var"#1611#1613",Tuple{Tuple{N
othing,Nothing},Tuple{}}}}(Zygote.var"#173#174"{Zygote.var"#1611#1613",Tuple{Tuple{Nothing,Nothing},Tuple{}}}(Zygote.var"#1611#1613"(), ((nothing, nothing), ()))), false), Core.Compiler.Co
nst(Zygote.var"#237#back#127"{typeof(identity)}(identity), false), Zygote.var"#334#back#175"{Zygote.var"#173#174"{Zygote.var"#72#back#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,
Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}}}},Tuple{NTuple{4,Nothing},Tuple{}}}}, Core.Com
piler.Const(Zygote.var"#237#back#127"{typeof(identity)}(identity), false), Core.Compiler.Const(∂(broadcastable), false), Core.Compiler.Const(Zygote.var"#3039#back#1181"{Zygote.var"#1174#11
78"}(Zygote.var"#1174#1178"()), false)])
│   %41 = %new(typeof(∂(broadcasted)), %40)::typeof(∂(broadcasted))
│   %42 = Base.tuple(%33, %41)::Core.Compiler.PartialStruct(Tuple{CuArray{Float64,2,Nothing},typeof(∂(broadcasted))}, Any[CuArray{Float64,2,Nothing}, Core.Compiler.PartialStruct(typeof(∂(b
roadcasted)), Any[Core.Compiler.PartialStruct(Tuple{typeof(∂(broadcastable)),Zygote.var"#334#back#175"{Zygote.var"#173#174"{Zygote.var"#1611#1613",Tuple{Tuple{Nothing,Nothing},Tuple{}}}},Z
ygote.var"#237#back#127"{typeof(identity)},Zygote.var"#334#back#175"{Zygote.var"#173#174"{Zygote.var"#72#back#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{
Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,Nothing}}}}},Tuple{NTuple{4,Nothing},Tuple{}}}},Zygote.var"#237#back#127"{typeof(
identity)},typeof(∂(broadcastable)),Zygote.var"#3039#back#1181"{Zygote.var"#1174#1178"}}, Any[Core.Compiler.Const(∂(broadcastable), false), Core.Compiler.Const(Zygote.var"#334#back#175"{Zy
gote.var"#173#174"{Zygote.var"#1611#1613",Tuple{Tuple{Nothing,Nothing},Tuple{}}}}(Zygote.var"#173#174"{Zygote.var"#1611#1613",Tuple{Tuple{Nothing,Nothing},Tuple{}}}(Zygote.var"#1611#1613"(
), ((nothing, nothing), ()))), false), Core.Compiler.Const(Zygote.var"#237#back#127"{typeof(identity)}(identity), false), Zygote.var"#334#back#175"{Zygote.var"#173#174"{Zygote.var"#72#back
#1874"{Zygote.var"#1868#1873"{Zygote.var"#back#1706"{2,Zygote.var"#_back#1704"{Tuple{CuArray{Float32,2,CuArray{Float32,1,Nothing}},Float32},CuArray{ForwardDiff.Dual{Nothing,Float64,2},2,No
thing}}}}},Tuple{NTuple{4,Nothing},Tuple{}}}}, Core.Compiler.Const(Zygote.var"#237#back#127"{typeof(identity)}(identity), false), Core.Compiler.Const(∂(broadcastable), false), Core.Compile
r.Const(Zygote.var"#3039#back#1181"{Zygote.var"#1174#1178"}(Zygote.var"#1174#1178"()), false)])])])
└──       return %42
6 ─       $(Expr(:meta, :inline))
) => Tuple{CuArray{Float64,2,Nothing},typeof(∂(broadcasted))}
ararslan commented 4 years ago

Does this only happen with CuArrays 2, or does it also happen with CuArrays 1.7?

aterenin commented 4 years ago

I'm not sure. But defining the following adjoint works around the crash.

@adjoint broadcasted(::typeof(mod), x::Numeric, y::Numeric) = mod.(x,y), Δ -> (nothing, Δ, .-floor.(x./y).*Δ)
maleadt commented 4 years ago

I get the following:

julia> v5 = Zygote._pullback(v4, Base.broadcasted, Main.mod, v2, 1.0f0)
ERROR: GPU compilation of broadcast(CuArrays.CuKernelContext, CUDAnative.CuDeviceArray{Tuple{Float32,Zygote.var"#1611#back#614"{Zygote.var"#612#613"{Float32,Float32}}},2,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1666#1673"{Zygote.Context,typeof(mod)},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Float32}}) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Zygote.var"#1666#1673"{Zygote.Context,typeof(mod)},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Float32}}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel.  .f is of type Zygote.var"#1666#1673"{Zygote.Context,typeof(mod)} which is not isbits.
    .__context__ is of type Zygote.Context which is not isbits.
      .cache is of type Union{Nothing, IdDict{Any,Any}} which is not isbits.

I'm not sure how this gets past the validator in your case, but accessing non-isbits data like that (which is passed by pointer) will result in CPU pointers getting used on the GPU, resulting in illegal memory accesses.

aterenin commented 4 years ago

I don't quite follow - I've generally seen that error when accidentally passing an Array instead of a CuArray, for instance CuArrays.randn(5,5) .+ randn(1,1). But this isn't what we're doing here: the value 1.0f0 is a scalar value, not a pointer to an array. Maybe the broadcasting machinery is somehow erroneously turning it into one?

maleadt commented 4 years ago

No, this is the mod function that gets closed over by Zygote to include a non-isbits cache: Zygote.var"#1666#1673"{Zygote.Context,typeof(mod)} Doesn't only apply to arrays.

CarloLucibello commented 4 years ago

I can reproduce the issue on CuArrays 1.7

aterenin commented 4 years ago

No, this is the mod function that gets closed over by Zygote to include a non-isbits cache: Zygote.var"#1666#1673"{Zygote.Context,typeof(mod)} Doesn't only apply to arrays.

Ah I see, thanks. This also explains why I was getting different types than Zygote in my output when I was trying to reproduce this issue. On the other hand, why does Zygote need a cache in this case?

maleadt commented 4 years ago

cc @MikeInnes

maleadt commented 6 months ago

Going to close this as stale. Feel free to open a new issue if the problem still exists.