FluxML / IRTools.jl

Mike's Little Intermediate Representation
MIT License
111 stars 35 forks source link

Something in IRTools.jl is failing on Julia nightly #106

Closed bvdmitri closed 1 year ago

bvdmitri commented 1 year ago

We get consistent failures on GitHub CI with Julia nigthly in our code that uses Zygote, which looks like some problem in IRTools. The error happens in three separate completely unrelated branches with different changes, so we assume our changes did not cause the issue:

https://github.com/biaslab/ReactiveMP.jl/actions/runs/4244267646/jobs/7378178164 https://github.com/biaslab/ReactiveMP.jl/actions/runs/4243540900/jobs/7376449987 https://github.com/biaslab/ReactiveMP.jl/actions/runs/4243889962/jobs/7377272496

The start of the error looks like:

Test threw exception
  Expression: ≈(prod(test[:method], n1, n2), n_analytical, atol = test[:tol])
  MethodError: no method matching length(::Nothing)

  Closest candidates are:
    length(::Union{Base.KeySet, Base.ValueIterator})
     @ Base abstractdict.jl:58
    length(::Union{SparseArrays.FixedSparseVector{Tv, Ti}, SparseArrays.SparseVector{Tv, Ti}} where {Tv, Ti})
     @ SparseArrays /opt/hostedtoolcache/julia/nightly/x64/share/julia/stdlib/v1.10/SparseArrays/src/sparsevector.jl:95
    length(::Union{LinearAlgebra.Adjoint{T, <:Union{StaticArraysCore.StaticArray{Tuple{var"#s2"}, T, 1} where var"#s2", StaticArraysCore.StaticArray{Tuple{var"#s3", var"#s4"}, T, 2} where {var"#s3", var"#s4"}}}, LinearAlgebra.Diagonal{T, <:StaticArraysCore.StaticArray{Tuple{var"#s14"}, T, 1} where var"#s14"}, LinearAlgebra.Hermitian{T, <:StaticArraysCore.StaticArray{Tuple{var"#s11", var"#s12"}, T, 2} where {var"#s11", var"#s12"}}, LinearAlgebra.LowerTriangular{T, <:StaticArraysCore.StaticArray{Tuple{var"#s19", var"#s20"}, T, 2} where {var"#s19", var"#s20"}}, LinearAlgebra.Symmetric{T, <:StaticArraysCore.StaticArray{Tuple{var"#s8", var"#s9"}, T, 2} where {var"#s8", var"#s9"}}, LinearAlgebra.Transpose{T, <:Union{StaticArraysCore.StaticArray{Tuple{var"#s2"}, T, 1} where var"#s2", StaticArraysCore.StaticArray{Tuple{var"#s3", var"#s4"}, T, 2} where {var"#s3", var"#s4"}}}, LinearAlgebra.UnitLowerTriangular{T, <:StaticArraysCore.StaticArray{Tuple{var"#s25", var"#s26"}, T, 2} where {var"#s25", var"#s26"}}, LinearAlgebra.UnitUpperTriangular{T, <:StaticArraysCore.StaticArray{Tuple{var"#s22", var"#s23"}, T, 2} where {var"#s22", var"#s23"}}, LinearAlgebra.UpperTriangular{T, <:StaticArraysCore.StaticArray{Tuple{var"#s16", var"#s17"}, T, 2} where {var"#s16", var"#s17"}}, StaticArraysCore.StaticArray{Tuple{var"#s26"}, T, 1} where var"#s26", StaticArraysCore.StaticArray{Tuple{var"#s1", var"#s4"}, T, 2} where {var"#s1", var"#s4"}, StaticArraysCore.StaticArray{<:Tuple, T}} where T)
     @ StaticArrays ~/.julia/packages/StaticArrays/pTgFe/src/abstractarray.jl:1
    ...

  Stacktrace:
    [1] meta(T::Any; types::Any, world::Any)
      @ IRTools.Inner ~/.julia/packages/IRTools/LbzBn/src/reflection/reflection.jl:50
    [2] meta
      @ ~/.julia/packages/IRTools/LbzBn/src/reflection/reflection.jl:43 [inlined]
    [3] has_chain_rrule(T::Type)
      @ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:20
    [4] #s78#1107
      @ ~/.julia/packages/Zygote/g2w9o/src/compiler/interface2.jl:20 [inlined]
    [5] var"#s78#1107"(::Any, ctx::Any, f::Any, args::Any)
      @ Zygote ./none:0
    [6] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
      @ Core ./boot.jl:600
    [7] pullback(f::Function, cx::Zygote.Context{false}, args::Float64)
      @ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:44
    [8] pullback
      @ ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:42 [inlined]
    [9] gradient(f::Function, args::Float64)
      @ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:96
   [10] compute_derivative(#unused#::ZygoteGrad, f::ReactiveMP.var"#138#139"{NormalMeanVariance{Float64}}, value::Float64)
      @ ReactiveMP ~/work/ReactiveMP.jl/ReactiveMP.jl/src/ReactiveMP.jl:198

Zygote and IRTools versions are:

[e88e6eb3] Zygote v0.6.55
[7869d1d1] IRTools v0.4.8

Tests are failing only on Julia nightly (1.10) and are passing on 1.6, 1.7, and 1.8

ToucheSir commented 1 year ago

The relevant IRTools lines are https://github.com/FluxML/IRTools.jl/blob/v0.4.8/src/reflection/reflection.jl#L49-L50. It seems slightly concerning that _methods_by_ftype can return nothing all of a sudden? Couldn't coax it to do so on 1.8 or 1.9 by trying to lookup non-existent methods.

bvdmitri commented 1 year ago

Could this be related: https://github.com/JuliaLang/julia/commit/f6e911aad7eaa0e703a20f0481265d339b2a3625 ?

In particular, the commit has the following change:

function _methods_by_ftype(@nospecialize(t), mt::Union{Core.MethodTable, Nothing}, lim::Int, world::UInt, ambig::Bool, min::Ref{UInt}, max::Ref{UInt}, has_ambig::Ref{Int32})
    - return ccall(:jl_matching_methods, Any, (Any, Any, Cint, Cint, UInt, Ptr{UInt}, Ptr{UInt}, Ptr{Int32}), t, mt, lim, ambig, world, min, max, has_ambig)::Union{Array{Any,1}, Bool}
    + return ccall(:jl_matching_methods, Any, (Any, Any, Cint, Cint, UInt, Ptr{UInt}, Ptr{UInt}, Ptr{Int32}), t, mt, lim, ambig, world, min, max, has_ambig)::Union{Vector{Any},Nothing}
end

The commit also changes the return result from jl_false to jl_nothing in some cases. But I haven't investigated further.

ToucheSir commented 1 year ago

Nice find! Going further up the trace we get to https://github.com/biaslab/ReactiveMP.jl/blob/9d907e9a0449be815043eea3eab9946a54116576/src/approximations/cvi.jl#L125-L126. I wonder if logq being a function defined in a loop has something to do with this. @aviatesk as the author of the linked commit, would you know why _methods_by_ftype with limit=-1 would succeed up to 1.9 but fail on 1.10?

aviatesk commented 1 year ago

see #107

ToucheSir commented 1 year ago

Thanks. I think the bigger issue though is that we're hitting the nothing/false case at all. Do you know of any reason why pre-nightly would be able to lookup the method while nightly would fail?

bvdmitri commented 1 year ago

I have the same impression as @ToucheSir . As I commented on the PR length(false) === 1, so the check length(_methods) == 0 was never receiving false, otherwise the code would fail two lines below anyway. The bigger issue here is that all of the sudden the code actually gets nothing/false

bvdmitri commented 1 year ago

FYI: our tests are passing on Julia Version 1.10.0-DEV.651 and we don't have this error in our recent CI action. The fail was on Julia Version 1.10.0-DEV.650.

aviatesk commented 1 year ago

I don't know if we are hitting this case. It's possible we saw this error in some nightly version due to changes in method look up.

ToucheSir commented 1 year ago

Well, at least #106 should make meta more robust and we can consider the immediate issue resolved. I'm going to close this issue for now, but if the behaviour comes back just comment/ping me and we can re-open.