Closed giordano closed 1 month ago
The structured arrays seem to be a red herring in this one, and the issue may be reproduced using only Matrix
es. Reduced further:
julia> α * Ac * Bc + β * C
1×1 Matrix{BigFloat}:
-0.0001400633263149390038358987549395176373104782747840190352000128617941687774417292
julia> mul!(copy(C), Ac, Bc, α, β)
1×1 Matrix{BigFloat}:
-0.0001400037216701636132108987549395176373104782747840190352000128617941687774417292
In the out-of-place case, the first multiplication α * Ac * Bc
is carried out in Float32
precision, whereas in the second case, all the numbers are promoted to BigFloat
first before the multiplication is carried out. This seems to explain the differences in the numbers:
julia> mul!(Float32.(zero(C)), Ac, Bc, α, false) + C * β
1×1 Matrix{BigFloat}:
-0.0001400633263149390038358987549395176373104782747840190352000128617941687774417292
julia> mul!(zero(C), Ac, Bc, α, false) + C * β
1×1 Matrix{BigFloat}:
-0.0001400037216701636132108987549395176373104782747840190352000128617941687774417292
Worth noting that the two terms being added are similar in magnitude and with opposite signs, so the rounding differences play a part.
julia> (α * Ac * Bc)[1], (β * C)[1]
(3.5504863f0, -3.550626389543966306191335898754939517637310478274784019035200012861794168777442)
Since we end up comparing numbers close to zero, perhaps we need an atol
here as well.
On 4633607ce9b9f077f32f89f09a136e04389bbac2 with Linux (either x86_64 or aarch64) I get
Reduced to
Somewhat distressfully, this doesn't reproduce on aarch64-darwin, in the sense that the above reproducer gives