JuliaSIMD / LoopVectorization.jl

Macro(s) for vectorizing loops.
MIT License
742 stars 66 forks source link

`vtrunc(::Float64)` issue #495

Closed MilesCranmer closed 1 year ago

MilesCranmer commented 1 year ago

I'm seeing a new issue in the DynamicExpressions.jl (and thus SymbolicRegression.jl/PySR) unit tests which seems to be coming from a vectorized Zygote kernel. My unit tests haven't changed but I started seeing this issue in the last ~1-2 weeks or so.

Here is the traceback:

ERROR: LoadError: MethodError: no method matching vtrunc(::Float64)

Closest candidates are:
  vtrunc(::VectorizationBase.Vec{W, T}) where {W, T<:Union{Float32, Float64}}
   @ VectorizationBase ~/.julia/packages/VectorizationBase/0dXyA/src/llvm_intrin/intrin_funcs.jl:156
  vtrunc(::Type{I}, ::VectorizationBase.VecUnroll{N, 1, T, T}) where {N, I<:Union{Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8}, T<:Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, SIMDTypes.Bit}}
   @ VectorizationBase ~/.julia/packages/VectorizationBase/0dXyA/src/llvm_intrin/intrin_funcs.jl:173
  vtrunc(::Type{I}, ::VectorizationBase.AbstractSIMD{W, T}) where {W, I<:Union{Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8}, T<:Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, SIMDTypes.Bit}}
   @ VectorizationBase ~/.julia/packages/VectorizationBase/0dXyA/src/llvm_intrin/intrin_funcs.jl:178
  ...

Stacktrace:
  [1] fmap
    @ ~/.julia/packages/VectorizationBase/0dXyA/src/vecunroll/fmap.jl:3 [inlined]
  [2] vtrunc
    @ ~/.julia/packages/VectorizationBase/0dXyA/src/vecunroll/fmap.jl:75 [inlined]
  [3] trunc
    @ ~/.julia/packages/VectorizationBase/0dXyA/src/base_defs.jl:46 [inlined]
  [4] sincos_fast
    @ ~/.julia/packages/SLEEFPirates/Io8eB/src/trig.jl:380 [inlined]
  [5] sincos
    @ ~/.julia/packages/SLEEFPirates/Io8eB/src/SLEEFPirates.jl:202 [inlined]
  [6] rrule(#unused#::typeof(cos), x::VectorizationBase.VecUnroll{1, 1, Float64, Float64})
    @ ChainRules ~/.julia/packages/ChainRules/aKxNz/src/rulesets/Base/fastmath_able.jl:25
  [7] rrule(::Zygote.ZygoteRuleConfig{Zygote.Context{false}}, ::Function, ::VectorizationBase.VecUnroll{1, 1, Float64, Float64})
    @ ChainRulesCore ~/.julia/packages/ChainRulesCore/0t04l/src/rules.jl:134
  [8] chain_rrule
    @ ~/.julia/packages/Zygote/HTsWj/src/compiler/chainrules.jl:223 [inlined]
  [9] macro expansion
    @ ~/.julia/packages/Zygote/HTsWj/src/compiler/interface2.jl:101 [inlined]
 [10] _pullback(ctx::Zygote.Context{false}, f::typeof(cos), args::VectorizationBase.VecUnroll{1, 1, Float64, Float64})
    @ Zygote ~/.julia/packages/Zygote/HTsWj/src/compiler/interface2.jl:101
 [11] _pullback
    @ ~/Documents/DynamicExpressions.jl/test/test_derivatives.jl:11 [inlined]
 [12] _pullback(ctx::Zygote.Context{false}, f::typeof(custom_cos), args::VectorizationBase.VecUnroll{1, 1, Float64, Float64})
    @ Zygote ~/.julia/packages/Zygote/HTsWj/src/compiler/interface2.jl:0
 [13] pullback(f::Function, cx::Zygote.Context{false}, args::VectorizationBase.VecUnroll{1, 1, Float64, Float64})
    @ Zygote ~/.julia/packages/Zygote/HTsWj/src/compiler/interface.jl:44
 [14] pullback
    @ ~/.julia/packages/Zygote/HTsWj/src/compiler/interface.jl:42 [inlined]
 [15] gradient(f::Function, args::VectorizationBase.VecUnroll{1, 1, Float64, Float64})
    @ Zygote ~/.julia/packages/Zygote/HTsWj/src/compiler/interface.jl:96
 [16] (::DynamicExpressions.OperatorEnumConstructionModule.var"#641#diff_op#3"{typeof(custom_cos)})(x::VectorizationBase.VecUnroll{1, 1, Float64, Float64})
    @ DynamicExpressions.OperatorEnumConstructionModule ~/Documents/DynamicExpressions.jl/src/OperatorEnumConstruction.jl:230
 [17] macro expansion
    @ ~/.julia/packages/LoopVectorization/IkdFM/src/reconstruct_loopset.jl:1107 [inlined]
 [18] _turbo_!
    @ ~/.julia/packages/LoopVectorization/IkdFM/src/reconstruct_loopset.jl:1107 [inlined]
 [19] macro expansion
    @ ~/Documents/DynamicExpressions.jl/src/Utils.jl:52 [inlined]
 [20] grad_deg1_eval(tree::Node{Float64}, #unused#::Val{3}, index_tree::NodeIndex, cX::Matrix{Float64}, op::typeof(custom_cos), diff_op::DynamicExpressions.OperatorEnumConstructionModule.var"#641#diff_op#3"{typeof(custom_cos)}, operators::OperatorEnum, #unused#::Val{true}, #unused#::Val{true})
    @ DynamicExpressions.EvaluateEquationDerivativeModule ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:331
 [21] _eval_grad_tree_array(tree::Node{Float64}, #unused#::Val{3}, index_tree::NodeIndex, cX::Matrix{Float64}, operators::OperatorEnum, #unused#::Val{true}, #unused#::Val{true})
    @ DynamicExpressions.EvaluateEquationDerivativeModule ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:262
 [22] eval_grad_tree_array
    @ ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:224 [inlined]
 [23] grad_deg2_eval(tree::Node{Float64}, #unused#::Val{3}, index_tree::NodeIndex, cX::Matrix{Float64}, op::typeof(+), diff_op::DynamicExpressions.OperatorEnumConstructionModule.var"#diff_op#2"{typeof(+)}, operators::OperatorEnum, #unused#::Val{true}, #unused#::Val{true})
    @ DynamicExpressions.EvaluateEquationDerivativeModule ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:360
 [24] _eval_grad_tree_array(tree::Node{Float64}, #unused#::Val{3}, index_tree::NodeIndex, cX::Matrix{Float64}, operators::OperatorEnum, #unused#::Val{true}, #unused#::Val{true})
    @ DynamicExpressions.EvaluateEquationDerivativeModule ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:274
 [25] eval_grad_tree_array
    @ ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:224 [inlined]
 [26] grad_deg2_eval(tree::Node{Float64}, #unused#::Val{3}, index_tree::NodeIndex, cX::Matrix{Float64}, op::typeof(+), diff_op::DynamicExpressions.OperatorEnumConstructionModule.var"#diff_op#2"{typeof(+)}, operators::OperatorEnum, #unused#::Val{true}, #unused#::Val{true})
    @ DynamicExpressions.EvaluateEquationDerivativeModule ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:356
 [27] _eval_grad_tree_array(tree::Node{Float64}, #unused#::Val{3}, index_tree::NodeIndex, cX::Matrix{Float64}, operators::OperatorEnum, #unused#::Val{true}, #unused#::Val{true})
    @ DynamicExpressions.EvaluateEquationDerivativeModule ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:274
 [28] eval_grad_tree_array(tree::Node{Float64}, #unused#::Val{3}, index_tree::NodeIndex, cX::Matrix{Float64}, operators::OperatorEnum, #unused#::Val{true}, #unused#::Val{true})
    @ DynamicExpressions.EvaluateEquationDerivativeModule ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:224
 [29] eval_grad_tree_array(tree::Node{Float64}, cX::Matrix{Float64}, operators::OperatorEnum; variable::Bool, turbo::Bool)
    @ DynamicExpressions.EvaluateEquationDerivativeModule ~/Documents/DynamicExpressions.jl/src/EvaluateEquationDerivative.jl:202
 [30] top-level scope
    @ ~/Documents/DynamicExpressions.jl/test/test_derivatives.jl:81
 [31] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [32] top-level scope
    @ REPL[1]:1
in expression starting at /Users/mcranmer/Documents/DynamicExpressions.jl/test/test_derivatives.jl:39

It seems to be from the gradient of the expression tree = (((pow_abs2(x1, x2) + x3) + custom_cos(1.0 + x3)) + (3.0 / x1)) in my unittests here.

I think what is happening is I am using Zygote to create a function for the derivative of cos, and then using that function within a @turbo loop (it sees the function at compile time). This hasn't given me issues before for other functions so I think there is something new happening.

The actual loop in question is here: https://github.com/SymbolicML/DynamicExpressions.jl/blob/71a7b581a9ec203232fb99359841765b4082343c/src/EvaluateEquationDerivative.jl#L331-L338

I can isolate this if needed. Let me know what other info I should include. Maybe it's related to a new Zygote update rather than an update in the SIMD libraries?

chriselrod commented 1 year ago

There is a lot of noise here because of formatting, but it should fix this issue: https://github.com/JuliaSIMD/SLEEFPirates.jl/commit/a64124aa8c1e3b4afde5183ea1937e1487243570#commitcomment-114228763

The new rule for cos calling sincos may have been the issue?

  [5] sincos
    @ ~/.julia/packages/SLEEFPirates/Io8eB/src/SLEEFPirates.jl:202 [inlined]
  [6] rrule(#unused#::typeof(cos), x::VectorizationBase.VecUnroll{1, 1, Float64, Float64})

The type VectorizationBase.VecUnroll{1, 1, Float64, Float64} represents unrolling without vectorization, which is perhaps a little uncommon, and apparently was untested for sincos. The SLEEFPirates commit above adds support for sincos; that this was missing was simply an oversight.

chriselrod commented 1 year ago

Please try with SLEEFPirates 0.6.39 and confirm this fixes your problem.

MilesCranmer commented 1 year ago

Amazing turnaround time! Thanks so much. Confirmed it fixes it.