FixedEffects / FixedEffectModels.jl

Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables
Other
227 stars 46 forks source link

Error with single precision gpu model #170

Closed dwinkler1 closed 3 years ago

dwinkler1 commented 3 years ago

Unfortunatelly I do not understand CUDA kernels at all but I am happy to help if you have any suggestions. This is the error I get:

julia> femodgpusingle = reg(df, @formula(Sales ~ NDI + fe(State) + fe(Year)),                                                                                                         
        method=:gpu, double_precision = false)                                                                                                                                                                                                                                                                                                                            

 ERROR: InvalidIRError: compiling kernel gather_kernel!(CuDeviceVector{Float32, 1}, CuDeviceVector{UInt32, 1}, Float64, CuDeviceVector{Float32, 1}, CuDeviceVector{Float32, 1})        
 resulted in invalid LLVM IR                                                                                                                                                           
 Reason: unsupported dynamic function invocation (call to atomic_add!)                                                                                                                 
 Stacktrace:                                                                                                                                                                           
  [1] gather_kernel!                                                                                                                                                                   
    @ ~/.julia/packages/FixedEffects/uGM5q/src/FixedEffectSolvers/FixedEffectSolverGPU.jl:76                                                                                           
 Stacktrace:                                                                                                                                                                           
   [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(FixedEffects.gather_kernel!),                     
 Tuple{CuDeviceVector{Float32, 1}, CuDeviceVector{UInt32, 1}, Float64, CuDeviceVector{Float32, 1}, CuDeviceVector{Float32, 1}}}}, args::LLVM.Module)                                   
     @ GPUCompiler ~/.julia/packages/GPUCompiler/eJOtJ/src/validation.jl:111                                                                                                           
   [2] macro expansion                                                                                                                                                                 
     @ ~/.julia/packages/GPUCompiler/eJOtJ/src/driver.jl:310 [inlined]                                                                                                                 
   [3] macro expansion                                                                                                                                                                 
     @ ~/.julia/packages/TimerOutputs/PZq45/src/TimerOutput.jl:226 [inlined]                                                                                                           
   [4] macro expansion                                                                                                                                                                 
     @ ~/.julia/packages/GPUCompiler/eJOtJ/src/driver.jl:308 [inlined]                                                                                                                 
   [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module, kernel::LLVM.Function; stri::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)                                
     @ GPUCompiler ~/.julia/packages/GPUCompiler/eJOtJ/src/utils.jl:62                                                                                                                 
   [6] cufunction_compile(job::GPUCompiler.CompilerJob)                                                                                                                                
     @ CUDA ~/.julia/packages/CUDA/3VnCC/src/compiler/execution.jl:301                                                                                                                 
   [7] check_cache                                                                                                                                                                     
     @ ~/.julia/packages/GPUCompiler/eJOtJ/src/cache.jl:47 [inlined]                                                                                                                   
   [8] cached_compilation                                                                                                                                                              
     @ ~/.julia/packages/FixedEffects/uGM5q/src/FixedEffectSolvers/FixedEffectSolverGPU.jl:73 [inlined]                                                                                
   [9] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams,                                               
 GPUCompiler.FunctionSpec{typeof(FixedEffects.gather_kernel!), Tuple{CuDeviceVector{Float32, 1}, CuDeviceVector{UInt32, 1}, Float64, CuDeviceVector{Float32, 1},                       
 CuDeviceVector{Float32, 1}}}}, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))                                                                       
     @ GPUCompiler ~/.julia/packages/GPUCompiler/eJOtJ/src/cache.jl:0                                                                                                                  
  [10] cufunction(f::typeof(FixedEffects.gather_kernel!), tt::Type{Tuple{CuDeviceVector{Float32, 1}, CuDeviceVector{UInt32, 1}, Float64, CuDeviceVector{Float32, 1},                   
 CuDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})                                                        
     @ CUDA ~/.julia/packages/CUDA/3VnCC/src/compiler/execution.jl:289                                                                                                                 
  [11] cufunction                                                                                                                                                                      
     @ ~/.julia/packages/CUDA/3VnCC/src/compiler/execution.jl:283 [inlined]                                                                                                            
  [12] macro expansion                                                                                                                                                                 
     @ ~/.julia/packages/CUDA/3VnCC/src/compiler/execution.jl:102 [inlined]                                                                                                            
  [13] gather!(fecoef::CuArray{Float32, 1}, refs::CuArray{UInt32, 1}, α::Float64, y::CuArray{Float32, 1}, cache::CuArray{Float32, 1}, nthreads::Int64)                                 
     @ FixedEffects ~/.julia/packages/FixedEffects/uGM5q/src/FixedEffectSolvers/FixedEffectSolverGPU.jl:69            
  [14] mul!(fecoefs::FixedEffects.FixedEffectCoefficients{CuArray{Float32, 1}}, Cfem::LinearAlgebra.Adjoint{Float32, FixedEffects.FixedEffectLinearMapGPU{Float32}},                   
 y::CuArray{Float32, 1}, α::Float64, β::Float64)                                                                                                                                       
     @ FixedEffects ~/.julia/packages/FixedEffects/uGM5q/src/FixedEffectSolvers/FixedEffectSolverGPU.jl:61                                                                             
  [15] solve_residuals!(r::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, feM::FixedEffects.FixedEffectSolverGPU{Float32}; tol::Float64,    
 maxiter::Int64)                                                                                                                                                                       
     @ FixedEffects ~/.julia/packages/FixedEffects/uGM5q/src/FixedEffectSolvers/FixedEffectSolverGPU.jl:191                                                                            
  [16] solve_residuals!(X::FixedEffectModels.Combination{Float64}, feM::FixedEffects.FixedEffectSolverGPU{Float32}; progress_bar::Bool, kwargs::Base.Iterators.Pairs{Symbol, Real,     
 Tuple{Symbol, Symbol}, NamedTuple{(:maxiter, :tol), Tuple{Int64, Float64}}})                                                                                                          
     @ FixedEffects ~/.julia/packages/FixedEffects/uGM5q/src/FixedEffectSolvers/FixedEffectSolverGPU.jl:206                                                                            
  [17] reg(df::Any, formula::FormulaTerm, vcov::StatsBase.CovarianceEstimator; contrasts::Dict, weights::Union{Nothing, Symbol}, save::Union{Bool, Symbol}, method::Symbol,            
 nthreads::Integer, double_precision::Bool, tol::Real, maxiter::Integer, drop_singletons::Bool, progress_bar::Bool, dof_add::Integer, subset::Union{Nothing, AbstractVector{T} where   
 T}, first_stage::Bool)                                                                                                                                                                
     @ FixedEffectModels ~/.julia/packages/FixedEffectModels/KIQzR/src/fit.jl:229                                                                                                      
  [18] top-level scope                                                                                                                                                                 
     @ REPL[5]:1                                                                                                                                                                       
  [19] top-level scope                                                                                                                                                                 
     @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81                                                                                                                           
 pkg> st                                                                                                                                                            
       Status `~/Documents/Tests/julianotes/fixedeffectscuda/Project.toml`                                                                                                             
   [052768ef] CUDA v3.2.1                                                                                                                                                              
   [9d5cd8c9] FixedEffectModels v1.6.1                                                                                                                                                 
   [c8885935] FixedEffects v2.0.4                                                                                                                                                      
   [ce6b1742] RDatasets v0.7.5   
 julia> versioninfo()                                                                                                                                                                  
 Julia Version 1.6.1                                                                                                                                                                   
 Commit 6aaedecc44 (2021-04-23 05:59 UTC)                                                                                                                                              
 Platform Info:                                                                                                                                                                        
   OS: Linux (x86_64-pc-linux-gnu)                                                                                                                                                     
   CPU: Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz                                                                                                                                          
   WORD_SIZE: 64                                                                                                                                                                       
   LIBM: libopenlibm                                                                                                                                                                   
   LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)                                                                                                                                       
 Environment:                                                                                                                                                                          
   JULIA_NUM_THREADS = 8                                                                                                                                                               
dwinkler1 commented 3 years ago

A downgrade to FixedEffects v2.0.3 fixes this

matthieugomez commented 3 years ago

Thanks for the report. Could you check that it works with FixedEffects v2.0.5? https://github.com/JuliaRegistries/General/pull/39432 (I don't have a GPU that works with CUDA unfortunately — which is why new versions sometimes introduce mistakes).

matthieugomez commented 3 years ago

Closing since https://github.com/FixedEffects/FixedEffectModels.jl/issues/172 reports it now works. Thanks again for the report.