LuxDL / Lux.jl

Elegant and Performant Scientific Machine Learning in Julia
https://lux.csail.mit.edu/
MIT License
504 stars 63 forks source link

`Enzyme.Forward` hits Octavian dispatch in Dense #853

Closed prbzrg closed 2 months ago

prbzrg commented 2 months ago
Could not keep minus one
MethodInstance for Octavian.packaloopmul!(::LayoutPointers.StridedPointer{Float32, 2, 1, 0, (1, 2), Tuple{Static.StaticInt{4}, Int64}, Tuple{Static.StaticInt{0}, Static.StaticInt{0}}}, ::LayoutPointers.StridedPointer{Float32, 2, 1, 0, (1, 2), Tuple{Static.StaticInt{4}, Int64}, Tuple{Static.StaticInt{0}, Static.StaticInt{0}}}, ::LayoutPointers.StridedPointer{Float32, 2, 1, 0, (1, 2), Tuple{Static.StaticInt{4}, Int64}, Tuple{Static.StaticInt{0}, Static.StaticInt{0}}}, ::Bool, ::Static.StaticInt{0}, ::Int64, ::Int64, ::Int64)

Caused by:
Stacktrace:
 [1] macro expansion
   @ C:\Users\prbzr\.julia\packages\VectorizationBase\LqJbS\src\llvm_intrin\memory_addr.jl:407
 [2] _gep
   @ C:\Users\prbzr\.julia\packages\VectorizationBase\LqJbS\src\llvm_intrin\memory_addr.jl:407
 [3] increment_ptr
   @ C:\Users\prbzr\.julia\packages\VectorizationBase\LqJbS\src\llvm_intrin\memory_addr.jl:442
 [4] increment_ptr
   @ C:\Users\prbzr\.julia\packages\VectorizationBase\LqJbS\src\llvm_intrin\memory_addr.jl:456
 [5] macro expansion
   @ C:\Users\prbzr\.julia\packages\LoopVectorization\tIJUA\src\reconstruct_loopset.jl:1107
 [6] _turbo_!
   @ C:\Users\prbzr\.julia\packages\LoopVectorization\tIJUA\src\reconstruct_loopset.jl:1107
 [7] macro expansion
   @ C:\Users\prbzr\.julia\packages\LoopVectorization\tIJUA\src\condense_loopset.jl:1179
 [8] packamul!
   @ C:\Users\prbzr\.julia\packages\Octavian\LeRg7\src\macrokernels.jl:74
 [9] packaloopmul!
   @ C:\Users\prbzr\.julia\packages\Octavian\LeRg7\src\macrokernels.jl:209

Stacktrace:
  [1] julia_error(cstr::Cstring, val::Ptr{LLVM.API.LLVMOpaqueValue}, errtype::Enzyme.API.ErrorType, data::Ptr{Nothing}, data2::Ptr{LLVM.API.LLVMOpaqueValue}, B::Ptr{LLVM.API.LLVMOpaqueBuilder})
    @ Enzyme.Compiler C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\compiler.jl:2229
  [2] EnzymeCreateForwardDiff(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{…}, TA::Enzyme.TypeAnalysis, returnValue::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{…}, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{…})
    @ Enzyme.API C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\api.jl:177
  [3] enzyme!(job::GPUCompiler.CompilerJob{…}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::Tuple{…}, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{…}, boxedArgs::Set{…})
    @ Enzyme.Compiler C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\compiler.jl:4064
  [4] codegen(output::Symbol, job::GPUCompiler.CompilerJob{…}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\compiler.jl:6302
  [5] codegen
    @ C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\compiler.jl:5493 [inlined]
  [6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\compiler.jl:7103
  [7] _thunk
    @ C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\compiler.jl:7103 [inlined]
  [8] cached_compilation
    @ C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\compiler.jl:7144 [inlined]
  [9] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{…}, ::Type{…}, ::Type{…}, tt::Type{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Type{…}, ::Val{…})
    @ Enzyme.Compiler C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\compiler.jl:7217
 [10] #s2048#18999
    @ C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\compiler.jl:7269 [inlined]
 [11]
    @ Enzyme.Compiler .\none:0
 [12] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core .\boot.jl:602
 [13] autodiff
    @ C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\Enzyme.jl:435 [inlined]
 [14] #108
    @ C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\Enzyme.jl:1273 [inlined]
 [15] ntuple
    @ .\ntuple.jl:19 [inlined]
 [16] #jacobian#107
    @ C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\Enzyme.jl:1271 [inlined]
 [17] jacobian
    @ C:\Users\prbzr\.julia\packages\Enzyme\XGb4o\src\Enzyme.jl:1267 [inlined]
 [18] jacobian(f::StatefulLuxLayer{…}, backend::AutoEnzyme{…}, x::Matrix{…}, extras::DifferentiationInterfaceEnzymeExt.EnzymeForwardOneArgJacobianExtras{…})
    @ DifferentiationInterfaceEnzymeExt C:\Users\prbzr\.julia\packages\DifferentiationInterface\pu1SS\ext\DifferentiationInterfaceEnzymeExt\forward_onearg.jl:143
 [19] jacobian(f::StatefulLuxLayer{true, Dense{…}, ComponentVector{…}, @NamedTuple{}}, backend::AutoEnzyme{Nothing, Const}, x::Matrix{Float32})
    @ DifferentiationInterface C:\Users\prbzr\.julia\packages\DifferentiationInterface\pu1SS\src\fallbacks\no_extras.jl:9
 [20] top-level scope
    @ REPL[8]:1
Some type information was truncated. Use `show(err)` to see complete types.
using ComponentArrays, DifferentiationInterface, Enzyme, Lux, Random, Zygote

Enzyme.API.runtimeActivity!(true)

n = 2
r = rand(Float32, n, n)
nn = Chain(Dense(n => n, tanh))
ps, st = Lux.setup(Random.default_rng(), nn)
ps = ComponentArray(ps)
snn = StatefulLuxLayer{true}(nn, ps, st)

# working ↓
DifferentiationInterface.jacobian(snn, AutoZygote(), r)
DifferentiationInterface.jacobian(x -> first(nn(x, ps, st)), AutoZygote(), r)
# working ↑

# not working ↓
DifferentiationInterface.jacobian(snn, AutoEnzyme(; function_annotation=Enzyme.Const), r)
DifferentiationInterface.jacobian(snn, AutoEnzyme(; mode=Enzyme.Forward, function_annotation=Enzyme.Const), r)
DifferentiationInterface.jacobian(x -> first(nn(x, ps, st)), AutoEnzyme(; function_annotation=Enzyme.Const), r)
DifferentiationInterface.jacobian(x -> first(nn(x, ps, st)), AutoEnzyme(; mode=Enzyme.Forward, function_annotation=Enzyme.Const), r)
# not working ↑
Status `D:\Codes\Mine\bug-report\br-8\Project.toml`
  [b0b7db55] ComponentArrays v0.15.16
  [a0c0ee7d] DifferentiationInterface v0.5.14
  [7da242da] Enzyme v0.12.32
  [b2108857] Lux v0.5.65
  [e88e6eb3] Zygote v0.6.70
  [9a3f8284] Random
Status `D:\Codes\Mine\bug-report\br-8\Manifest.toml`
  [47edcb42] ADTypes v1.7.1
  [621f4979] AbstractFFTs v1.5.0
  [79e6a3ab] Adapt v4.0.4
  [dce04be8] ArgCheck v2.3.0
  [4fba245c] ArrayInterface v7.15.0
  [a9b6321e] Atomix v0.1.0
  [62783981] BitTwiddlingConvenienceFunctions v0.1.6
  [fa961155] CEnum v0.5.0
  [2a0fbf3d] CPUSummary v0.2.6
  [082447d4] ChainRules v1.69.0
  [d360d2e6] ChainRulesCore v1.24.0
  [fb6a15b2] CloseOpenIntervals v0.1.13
  [bbf7d656] CommonSubexpressions v0.3.1
  [f70d9fcc] CommonWorldInvalidations v1.0.0
  [34da2185] Compat v4.16.0
  [b0b7db55] ComponentArrays v0.15.16
  [2569d6c7] ConcreteStructs v0.2.3
  [187b0558] ConstructionBase v1.5.7
  [adafc99b] CpuId v0.3.1
  [9a962f9c] DataAPI v1.16.0
  [864edb3b] DataStructures v0.18.20
  [e2d170a0] DataValueInterfaces v1.0.0
  [163ba53b] DiffResults v1.1.0
  [b552c78f] DiffRules v1.15.1
  [a0c0ee7d] DifferentiationInterface v0.5.14
  [8d63f2c5] DispatchDoctor v0.4.14
  [ffbed154] DocStringExtensions v0.9.3
  [7da242da] Enzyme v0.12.32
  [f151be2c] EnzymeCore v0.7.8
  [e2ba6199] ExprTools v0.1.10
  [9aa1b823] FastClosures v0.3.2
  [1a297f60] FillArrays v1.12.0
  [f6369f11] ForwardDiff v0.10.36
  [d9f16b24] Functors v0.4.12
  [0c68f7d7] GPUArrays v10.3.1
  [46192b85] GPUArraysCore v0.1.6
⌅ [61eb1bfa] GPUCompiler v0.26.7
  [3e5b6fbb] HostCPUFeatures v0.1.17
  [0e44f5e4] Hwloc v3.3.0
  [7869d1d1] IRTools v0.4.14
  [615f187c] IfElse v0.1.1
  [92d709cd] IrrationalConstants v0.2.2
  [82899510] IteratorInterfaceExtensions v1.0.0
  [692b3bcd] JLLWrappers v1.5.0
  [63c18a36] KernelAbstractions v0.9.24
⌅ [929cbde3] LLVM v8.1.0
  [10f19ff3] LayoutPointers v0.1.17
  [2ab3a3ac] LogExpFunctions v0.3.28
  [bdcacae8] LoopVectorization v0.12.171
  [30fc2ffe] LossFunctions v0.11.1
  [b2108857] Lux v0.5.65
  [bb33d45b] LuxCore v0.1.25
  [34f89e08] LuxDeviceUtils v0.1.27
  [82251201] LuxLib v0.3.48
  [7e8f7934] MLDataDevices v1.0.3
  [1914dd2f] MacroTools v0.5.13
  [d125e4d3] ManualMemory v0.1.8
  [872c559c] NNlib v0.9.22
  [77ba4419] NaNMath v1.0.2
  [d8793406] ObjectFile v0.4.2
  [6fd5a793] Octavian v0.3.28
  [6fe1bfb0] OffsetArrays v1.14.1
  [3bd65402] Optimisers v0.3.3
  [bac558e1] OrderedCollections v1.6.3
  [65ce6f38] PackageExtensionCompat v1.0.2
  [f517fe37] Polyester v0.7.16
  [1d0040c9] PolyesterWeave v0.2.2
  [aea7be01] PrecompileTools v1.2.1
  [21216c6a] Preferences v1.4.3
  [c1ae055f] RealDot v0.1.0
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [94e857df] SIMDTypes v0.1.0
  [476501e8] SLEEFPirates v0.6.43
  [6c6a2e73] Scratch v1.2.1
  [efcf1570] Setfield v1.1.1
  [dc90abb0] SparseInverseSubset v0.1.2
  [0a514795] SparseMatrixColorings v0.4.0
  [276daf66] SpecialFunctions v2.4.0
  [aedffcd0] Static v1.1.1
  [0d7ed370] StaticArrayInterface v1.8.0
  [90137ffa] StaticArrays v1.9.7
  [1e83bf80] StaticArraysCore v1.4.3
  [7792a7ef] StrideArraysCore v0.5.7
  [09ab397b] StructArrays v0.6.18
  [53d494c1] StructIO v0.3.1
  [3783bdb8] TableTraits v1.0.1
  [bd369af6] Tables v1.12.0
  [8290d209] ThreadingUtilities v0.5.2
  [a759f4b9] TimerOutputs v0.5.24
  [3a884ed6] UnPack v1.0.2
  [0fe1646c] UnrolledUtilities v0.1.2
  [013be700] UnsafeAtomics v0.2.1
  [d80eeb9a] UnsafeAtomicsLLVM v0.2.1
  [3d5dd08c] VectorizationBase v0.21.70
  [d49dbf32] WeightInitializers v1.0.3
  [e88e6eb3] Zygote v0.6.70
  [700de1a5] ZygoteRules v0.2.5
  [7cc45869] Enzyme_jll v0.0.145+0
  [e33a78d0] Hwloc_jll v2.11.1+0
⌅ [dad2f222] LLVMExtra_jll v0.0.31+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [0dad84c5] ArgTools v1.1.1
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching
  [9fa8497b] Future
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.10.0
  [de0858da] Printf
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization
  [6462fe0b] Sockets
  [2f01184e] SparseArrays v1.10.0
  [10745b16] Statistics v1.10.0
  [4607b0f0] SuiteSparse
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll v1.1.1+0
  [deac9b47] LibCURL_jll v8.4.0+0
  [e37daf67] LibGit2_jll v1.6.4+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.2+1
  [14a3606d] MozillaCACerts_jll v2023.1.10
  [4536629a] OpenBLAS_jll v0.3.23+4
  [05823500] OpenLibm_jll v0.8.1+2
  [bea87d4a] SuiteSparse_jll v7.2.1+1
  [83775a58] Zlib_jll v1.2.13+1
  [8e850b90] libblastrampoline_jll v5.8.0+1
  [8e850ede] nghttp2_jll v1.52.0+1
  [3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
Julia Version 1.10.4
Commit 48d4fd4843 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 12 default, 2 interactive, 7 GC (on 12 virtual cores)
prbzrg commented 2 months ago

All of them works with Zygote. And the error is different when I don't use StatefulLuxLayer.

avik-pal commented 2 months ago

This is very likely a DI issue due to closures (there is an option I think to mark f as constant)

cc @gdalle @wsmoses

wsmoses commented 2 months ago

Yeah I presume this succeeds if you just do Enzyme.gradient/jacobian(Reverse, Const(f), ...), no ?

gdalle commented 2 months ago

If the function is truly constant, can you try AutoEnzyme(function_annotation=Enzyme.Const)?

prbzrg commented 2 months ago

It didn't work. But I got a different error. I updated the first post.

avik-pal commented 2 months ago

~Ok this is more of a Lux issue, but I am very surprised Enzyme is hitting Octavian :sweat:. I explicitly try my best to circumvent all loopvec/octavian/polyester for enzyme and just call BLAS or use loops.~

The dispatches are defined exclusively for ReverseMode, they need to be extended to ForwardMode

avik-pal commented 2 months ago

The solution would probably be to bite the bullet and write the enzyme rules https://github.com/LuxDL/LuxLib.jl/blob/c185f04183d760b84d0dcfa2b49511255cd1e7dc/src/impl/matmul.jl#L233-L238, instead of switching the implementations

avik-pal commented 2 months ago

A smaller reproducer

using Lux, Enzyme, Random

n = 2
r = rand(Float32, n, n)
nn = Chain(Dense(n => n, tanh))
ps, st = Lux.setup(Random.default_rng(), nn)

Enzyme.autodiff(Forward, Const(LuxCore.stateless_apply), Duplicated,
    Const(nn), Duplicated(r, one.(r)), Const(ps))
avik-pal commented 2 months ago

ReverseMode enzyme works fine

DifferentiationInterface.jacobian(snn, AutoEnzyme(; function_annotation=Enzyme.Const, mode=Enzyme.Reverse), r)