FluxML / Flux.jl

Relax! Flux is the ML library that doesn't make you tensor
https://fluxml.ai/
Other
4.53k stars 610 forks source link

Hard error using dice loss #2383

Open cirobr opened 8 months ago

cirobr commented 8 months ago

Cheers,

Regardless of the model, data, or any other condition, I’ve never been able of using the built-in Flux.dice_coeff_loss() function. A very long error dump shows up, apparently tied to CUDA and memory usage.

The issue has been confirmed and duplicated on Discourse forum. For details, please check this link.

mcabbott commented 8 months ago

My MWE from the discourse thread is this:

julia> using Flux, CUDA

julia> let x = randn(3,5) |> cu
           y = Flux.onehotbatch("abcab", 'a':'c') |> cu
           Flux.dice_coeff_loss(x, y)  # works forward
       end
1.1841338f0

julia> let x = randn(3,5) |> cu
           y = Flux.onehotbatch("abcab", 'a':'c') |> cu
           gradient(Flux.mse, x, y)  # some gradients work
       end
(Float32[-0.16939788 -0.19461282 … -0.30000073 -0.017194644; 0.07464689 -0.15628384 … -0.17090265 -0.007114268; -0.22359066 -0.06903434 … 0.1566836 -0.022250716], nothing)

julia> let x = randn(3,5) |> cu
           y = Flux.onehotbatch("abcab", 'a':'c') |> cu
           gradient(Flux.dice_coeff_loss, x, y)
       end
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
...
ERROR: KernelException: exception thrown during kernel execution on device Tesla V100-PCIE-16GB
Stacktrace:
  [1] check_exceptions()
    @ CUDA ~/.julia/packages/CUDA/htRwP/src/compiler/exceptions.jl:34
  [2] device_synchronize(; blocking::Bool, spin::Bool)
    @ CUDA ~/.julia/packages/CUDA/htRwP/lib/cudadrv/synchronization.jl:180

(@v1.10) pkg> st Flux CUDA
Status `~/.julia/environments/v1.10/Project.toml`
  [052768ef] CUDA v5.2.0
  [587475ba] Flux v0.14.11

I don't know if this is the same error as yours, but it's surprising, and is a bug.

What "Run Julia on debug level 2 for device stack traces" means is that starting the REPL with julia -g2 will capture more information, which may help narrow this down. Can you try this, and paste here as much information as possible?

ToucheSir commented 8 months ago

Can you try pulling y .^ 2 and ŷ .^ 2 in https://github.com/FluxML/Flux.jl/blob/20d516bc29a98adeb3e831c382ff0e805f6a0b33/src/losses/functions.jl#L519 out on their own lines and seeing which one fails?

cirobr commented 6 months ago

Cheers, and sorry for long delay. To ease finding the root cause, have made my own dice_loss as follows:

function dice_loss(yhat::AbstractArray, y::AbstractArray)
      num = 2 * sum(yhat .* y)
      den = sum(yhat .^ 2) + sum(y .^ 2)
      # den2 = den + eps(Float32)
      # den2 = copy(den) + eps(Float32)
      # den2 = rand(Float32) + eps(Float32)
      # den2 = num + eps(Float32)
      dc  = num / den2
      return 1 - dc |> Float32
end

If the den(ominator) above is set to a random constant, or if the num(erator) is used to mimic the denominator as a calculated variable (last two den2 options), no problem is observed.

However, if either of the first two den2 options are used instead, the insanely long error dump shows up on training, regardless of the used convolutional model.

cirobr commented 6 months ago

And this is the attempt for capturing the error dump:

ERROR: a exception was thrown during kernel execution. For stacktrace reporting, run Julia on debug level 2 (by passing -g2 to the executable). ERROR: CuError(CUDA.cudaError_enum(0x000002bc), CUDA.OptionalWARNING: Error while freeing DeviceBuffer(8.000 MiB at 0x00007f3f74a00000): CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))

Stacktrace: [1] throw_api_error(res::CUDA.cudaError_enum) @ CUDA ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/libcuda.jl:30 [2] check @ ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/libcuda.jl:37 [inlined] [3] cuMemFree_v2 @ ~/.julia/packages/CUDA/jdJ7Z/lib/utils/call.jl:30 [inlined] [4] free(buf::CUDA.Mem.DeviceBuffer; stream::Nothing) @ CUDA.Mem ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/memory.jl:99 [5] free @ ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/memory.jl:92 [inlined] [6] #actual_free#1042 @ ~/.julia/packages/CUDA/jdJ7Z/src/pool.jl:78 [inlined] [7] actual_free @ ~/.julia/packages/CUDA/jdJ7Z/src/pool.jl:75 [inlined] [8] #_free#1067 @ ~/.julia/packages/CUDA/jdJ7Z/src/pool.jl:538 [inlined] [9] _free @ ~/.julia/packages/CUDA/jdJ7Z/src/pool.jl:523 [inlined] [10] macro expansion @ ~/.julia/packages/CUDA/jdJ7Z/src/pool.jl:508 [inlined] [11] macro expansion @ ./timing.jl:395 [inlined] [12] #free#1066 @ ~/.julia/packages/CUDA/jdJ7Z/src/pool.jl:507 [inlined] [13] free @ ~/.julia/packages/CUDA/jdJ7Z/src/pool.jl:496 [inlined] [14] (::CUDA.var"#1073#1074"{CUDA.Mem.DeviceBuffer, Bool})() @ CUDA ~/.julia/packages/CUDA/jdJ7Z/src/array.jl:101 [15] #context!#954 @ ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/state.jl:170 [inlined] [16] context! @ ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/state.jl:165 [inlined] [17] _free_buffer(buf::CUDA.Mem.DeviceBuffer, early::Bool) @ CUDA ~/.julia/packages/CUDA/jdJ7Z/src/array.jl:89 [18] release(rc::GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer}, args::Bool) @ GPUArrays ~/.julia/packages/GPUArrays/OKkAu/src/host/abstractarray.jl:42 [19] unsafe_free! @ ~/.julia/packages/GPUArrays/OKkAu/src/host/abstractarray.jl:91 [inlined] [20] unsafe_finalize!(xs::CuArray{Int64, 4, CUDA.Mem.DeviceBuffer}) @ CUDA ~/.julia/packages/CUDA/jdJ7Z/src/array.jl:113 [21] getvariables(show_modules::Bool) @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/trees.jl:295 [22] #invokelatest#2 @ ./essentials.jl:892 [inlined] [23] invokelatest @ ./essentials.jl:889 [inlined] [24] repl_getvariables_request(conn::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint, Base.PipeEndpoint}, params::@NamedTuple{modules::Bool}) @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/trees.jl:269 [25] dispatch_msg(x::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint, Base.PipeEndpoint}, dispatcher::VSCodeServer.JSONRPC.MsgDispatcher, msg::Dict{String, Any}) @ VSCodeServer.JSONRPC ~/.vscode-server/extensions/julialang.language-julia-1.79.2/scripts/packages/JSONRPC/src/typed.jl:67 [26] dispatch_msg(conn_endpoint::Base.RefValue{Union{Nothing, VSCodeServer.JSONRPC.JSONRPCEndpoint}}, msg_dispatcher::VSCodeServer.JSONRPC.MsgDispatcher, msg::Dict{String, Any}, is_dev::Bool) @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/VSCodeServer.jl:103 [27] macro expansion @ ~/.vscode-server/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/VSCodeServer.jl:153 [inlined] [28] macro expansion @ ./task.jl:479 [inlined] [29] (::VSCodeServer.var"#235#238"{Bool, String, Base.PipeEndpoint})() @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/VSCodeServer.jl:147 {String}(nothing)) Stacktrace: [1] throw_api_error(res::CUDA.cudaError_enum) @ CUDA ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/libcuda.jl:30 [2] nonblocking_synchronize(val::CuContext) @ CUDA ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/synchronization.jl:174 [3] device_synchronize(; blocking::Bool, spin::Bool) @ CUDA ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/synchronization.jl:185 [4] device_synchronize @ ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/synchronization.jl:180 [inlined] [5] checked_cuModuleLoadDataEx @ ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/module.jl:17 [inlined] [6] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any}) @ CUDA ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/module.jl:58 [7] CuModule @ ~/.julia/packages/CUDA/jdJ7Z/lib/cudadrv/module.jl:47 [inlined] [8] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{…}, entry::String, external_gvars::Vector{…}}) @ CUDA ~/.julia/packages/CUDA/jdJ7Z/src/compiler/compilation.jl:409 [9] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link)) @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:134 [10] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function) @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:103 [11] macro expansion @ ~/.julia/packages/CUDA/jdJ7Z/src/compiler/execution.jl:367 [inlined] [12] macro expansion @ ./lock.jl:267 [inlined] [13] cufunction(f::GPUArrays.var"#35#37", tt::Type{Tuple{…}}; kwargs::@Kwargs{}) @ CUDA ~/.julia/packages/CUDA/jdJ7Z/src/compiler/execution.jl:362 [14] cufunction @ ~/.julia/packages/CUDA/jdJ7Z/src/compiler/execution.jl:359 [inlined] [15] macro expansion @ ~/.julia/packages/CUDA/jdJ7Z/src/compiler/execution.jl:112 [inlined] [16] #launch_heuristic#1173 @ ~/.julia/packages/CUDA/jdJ7Z/src/gpuarrays.jl:17 [inlined] [17] launch_heuristic @ ~/.julia/packages/CUDA/jdJ7Z/src/gpuarrays.jl:15 [inlined] [18] _copyto! @ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:78 [inlined] [19] copyto! @ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:44 [inlined] [20] copy @ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:29 [inlined] [21] materialize @ ./broadcast.jl:903 [inlined] [22] (::Zygote.var"#1237#1240"{2, CuArray{Bool, 4, CUDA.Mem.DeviceBuffer}})(ȳ::CuArray{Bool, 4, CUDA.Mem.DeviceBuffer}) @ Zygote ~/.julia/packages/Zygote/nsBv0/src/lib/broadcast.jl:108 [23] #3876#back @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:72 [inlined] [24] dice_loss @ ~/projects/pascalvoc-segmentation/dice-loss-error.jl:164 [inlined] [25] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::Float32) @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0 [26] lossFunction @ ~/projects/pascalvoc-segmentation/dice-loss-error.jl:173 [inlined] [27] #5 @ ~/projects/pascalvoc-segmentation/dice-loss-error.jl:189 [inlined] [28] #291 @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:206 [inlined] [29] #2169#back @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:72 [inlined] [30] #4 @ ~/.julia/packages/Flux/Wz6D4/src/train.jl:107 [inlined] [31] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::Float32) @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0 [32] (::Zygote.var"#75#76"{Zygote.Pullback{Tuple{…}, Tuple{…}}})(Δ::Float32) @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface.jl:91 [33] withgradient(f::Function, args::UNet2) @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface.jl:213 [34] macro expansion @ ~/.julia/packages/Flux/Wz6D4/src/train.jl:107 [inlined] [35] macro expansion @ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328 [inlined] [36] train!(loss::Function, model::UNet2, data::MLUtils.DataLoader{…}, opt::@NamedTuple{…}; cb::Nothing) @ Flux.Train ~/.julia/packages/Flux/Wz6D4/src/train.jl:105 [37] train!(loss::Function, model::UNet2, data::MLUtils.DataLoader{…}, opt::@NamedTuple{…}) @ Flux.Train ~/.julia/packages/Flux/Wz6D4/src/train.jl:102 [38] top-level scope @ ~/projects/pascalvoc-segmentation/dice-loss-error.jl:188 Some type information was truncated. Use show(err) to see complete types.