Closed maximilian-gelbrecht closed 2 years ago
This behavior is intended and how mutation support works. In essence for reverse mode the derivative outputs are propagated to the derivative inputs. Specifically, when reverse-mode differentiating the store into R, it will propagate dR to its inputs, then zero dR.
This is required, for example, when something is called inside of a loop, like below.
for i = 1 : 10
R = A * B
end
Only the last store matters, and the correct behavior is as follows:
dA = 0
dB = 0
for i = 10:1
dA += dR * B
dB += dR * A
dR = 0
end
if the zeroing of dR were not there, then dA would equal 10 times its actual correct derivative!
This zero'ing behavior only applies to reverse mode, when using forward mode AD the original derivative inputs aren't modified.
Oh I see what you mean by the issue, the square one has the intended behavior here but the non-square one does not. My guess is that internally the non-square version has an internal copy inside of the linear algebra routine (meaning the true dR being used is different), but nonetheless will take a look.
Okay, got it. Probably not that important, I was just confused by the difference in behaviour. Maybe the documentation / example should changed though, as it currently shows unintended behaviour.
Reducing:
wmoses@beast:~/git/Enzyme.jl ((HEAD detached at origin/main)) $ cat mm.jl
using Enzyme
using LinearAlgebra
A = rand(1, 1)
B = rand(1, 1)
R = zeros(size(A,1), size(B,2))
∂z_∂R = [0.8;;]
∂z_∂R_copy = deepcopy(∂z_∂R)
∂z_∂A = zero(A)
∂z_∂B = zero(B)
function mul(R, A, B)
return BLAS.gemm!('N', 'N', 1.0, A, B, 0.0, R)
nothing
end
Enzyme.API.printall!(true)
Enzyme.autodiff(mul, Const, Duplicated(R, ∂z_∂R), Duplicated(A, ∂z_∂A), Duplicated(B, ∂z_∂B))
@show ∂z_∂R # should be 0, is [0.8;;]
using Enzyme
using LinearAlgebra
using LinearAlgebra.BLAS
@inline uptr(x) = Base.reinterpret(Ptr{Float64}, x)
# SUBROUTINE DGEMM(TRANSA,TRANSB,M,N,K,ALPHA,A,LDA,B,LDB,BETA,C,LDC)
# * .. Scalar Arguments ..
# DOUBLE PRECISION ALPHA,BETA
# INTEGER K,LDA,LDB,LDC,M,N
# CHARACTER TRANSA,TRANSB
# * .. Array Arguments ..
# DOUBLE PRECISION A(LDA,*),B(LDB,*),C(LDC,*)
function imul(C, A, m)
ka = 1
kb = 1
n = 1
A = uptr(A) #Ref(2.0)
# A = Ref(2.0)
B = Ref(2.0)
ccall((LinearAlgebra.BLAS.@blasfunc(dgemm_), LinearAlgebra.BLAS.libblastrampoline), Cvoid,
(Ref{UInt8}, Ref{UInt8}, Ref{LinearAlgebra.BLAS.BlasInt}, Ref{LinearAlgebra.BLAS.BlasInt},
Ref{LinearAlgebra.BLAS.BlasInt}, Ref{Float64}, Ptr{Float64}, Ref{LinearAlgebra.BLAS.BlasInt},
Ptr{Float64}, Ref{LinearAlgebra.BLAS.BlasInt}, Ref{Float64}, Ptr{Float64},
Ref{LinearAlgebra.BLAS.BlasInt}, Clong, Clong),
'N', 'N', m, n,
ka, 1.0, A, 1,
B, 1, 0.0, uptr(C),
1, 1, 1)
nothing
end
function mul(C, A, B, m)
imul(C, A, m)
end
A = rand(1, 1)
B = rand(1, 1)
R = zeros(size(A,1), size(B,2))
∂z_∂R = [0.8;;]
∂z_∂R_copy = deepcopy(∂z_∂R)
∂z_∂A = zero(A)
∂z_∂B = zero(B)
@inline ptr(x) = Base.reinterpret(Core.LLVMPtr{Float64, 0}, Base.unsafe_convert(Ptr{Float64}, x))
GC.@preserve R A B ∂z_∂R ∂z_∂A ∂z_∂B begin
mul(ptr(R), ptr(A), ptr(B), 1)
Enzyme.API.printall!(true)
Enzyme.autodiff(mul, Const,
Duplicated(ptr(R), ptr(∂z_∂R)),
Duplicated(ptr(A), ptr(∂z_∂A)), Duplicated(ptr(B), ptr(∂z_∂B)), Const(1))
@show ∂z_∂R
end
Looking at the example for a matrix multiply and playing with it I noticed the following behaviour that depends on the matrix size
The example but for the standard
LinearAlgebra.mul!
is the following:This works as intended.
If I change the matrix size to square matrices, the gradient input
∂z_∂R
is mutated to zeros, while the computed gradients∂z_∂A
and∂z_∂B
are still correct :Is this behaviour intentional or a bug?
(Running Enzyme v0.10.4)