Closed sethaxen closed 5 years ago
It's not intentional, no. Not sure what would cause this, but it should be an easy fix if someone can dig into it.
I can give it a try. The broadcast case is the correct one, right?
It looks like there’s an explicit gradient definition for real
acting on numbers, or broadcasted, but not acting on arrays:
This looks like it might be a broadcast issue. On 1.2, real(x)
just calls broadcast(real, x)
julia> @code_lowered real(A)
CodeInfo(
1 ─ %1 = Base.broadcast(Base.real, A)
└── return %1
)
julia> @code_lowered Base.broadcast(Base.real, A)
CodeInfo(
1 ─ %1 = Core.tuple(f)
│ %2 = Core._apply(Base.Broadcast.broadcasted, %1, As)
│ %3 = Base.Broadcast.materialize(%2)
└── return %3
)
so in principle custom adjoints from broadcasted
should be used. On the other hand, broadcasting with real.()
gives you
julia> g(x) = real.(x)
g (generic function with 1 method)
julia> @code_lowered g(A)
CodeInfo(
1 ─ %1 = Base.broadcasted(Main.real, x)
│ %2 = Base.materialize(%1)
└── return %2
)
As far as I can tell, the only difference is that the first one uses _apply
. I didn't see much on _apply
, but it seems to be called because of the splatting in broadcast
's default.
julia> myfun(f, x) = f(x);
julia> myfun2(f, x...) = f(x...);
julia> @code_lowered myfun(identity, 3.0)
CodeInfo(
1 ─ %1 = (f)(x)
└── return %1
)
julia> @code_lowered myfun2(identity, 3.0)
CodeInfo(
1 ─ %1 = Core._apply(f, x)
└── return %1
)
Could it be that _apply
is somehow circumventing the custom adjoints for broadcasted
?
Thanks for digging into this. It's an interesting case. We end up calling
which means real
is the identity (and gets the appropriate adjoint).
For now we should just fix with a custom adjoint. It's concerning that code like this can lead to bad gradients, though, since presumably there are other cases where people might specialise on the reals rather than writing code generically across all numbers.
Calling
real
on a matrix or broadcasting it over the matrix produces different adjoints when a complex adjoint is pulled back:The behavior of the 2nd (broadcasted) is consistent with scalar functions. Is there a reason why the two
real
s have different results, or is this a bug?