Open oscardssmith opened 3 years ago
provide an easy escape hatch to people writing compensated arithmetic libraries
And for people being bitten by: https://github.com/vchuravy/GPUifyLoops.jl/issues/91
provide an easy escape hatch to people writing compensated arithmetic libraries
And for people being bitten by: vchuravy/GPUifyLoops.jl#91
And https://github.com/JuliaGPU/KernelAbstractions.jl/issues/20, which is the successor issue.
( I feel seen )
But yes much more precise reasoning about fastmath optimization would be welcome. As the issue show something as harmless as fpcontract
can have counter-intuitive results.
Is there any compiler for any language that does FMA contraction by default, without an explicit flag of some sort?
IIRC nvcc
does this for CUDA device code, which is why KernelAbstractions
mimics that.
On Mon, Mar 22, 2021 at 4:50 PM Steven G. Johnson @.***> wrote:
Is there any compiler for any language that does FMA contraction by default, without an explicit flag of some sort?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/julia/issues/40139#issuecomment-804385241, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDO2VG6TPS3RSI6SKBPVDTE6USLANCNFSM4ZTPFN5Q .
Is there any compiler for any language that does FMA contraction by default, without an explicit flag of some sort?
GCC does this with -O2
.
I thought I'd commented here, but I guess I never hit the button. IMO, we should have separate float types with various optimization guarantees. That could be done in a package (with maybe some non-breaking changes in base to expose the flags in the intrinsic) and then we can decide in 2.0 if we want to change the default float type or not.
That's a really interesting way of solving this. The one thing I'd be worried about is that the biggest beneficiaries of this idea are probably people who don't even know what Floating Point is (or who don't deeply understand numerical analysis). I'm not sure that a package that requires users opt in to a special numeric type will show the benefits of a system designed to reduce the amount of manual annotation.
The package is just a way to prototype and see if we like the tradeoffs. We can decide in 2.0 what the behavior of floats should be.
FastFloats.jl? We'll really have painted ourselves into a corner — what kind of monster doesn't want fast floats?
While doing maybe FastFloats.jl first to prototype, is this a case of, "technically breaking" and we should just do it pre-2.0? We can always do SlowFloats.jl "what kind of monster doesn't want" that? Don't basically all supported platforms support FMA, so this would only be a documented change across versions, and people can stay with non-FMA LTS version?
The issue for me isn't worrying about platforms that don't have FMA, it's that there isn't a single canonical way to apply FMA to operations. Currently Julia just does exactly what you ask it to do. Unless you use a macro to tell it "actually, do something else besides what I asked you to do". If we start doing automatic FMAs, then what does an expression like a*b + c*d
actually compute? For Float64
values to be specific. Right now it's completely well-defined. If we introduce automatic FMA optimizations, then it might compute fma(a, b, c*d)
or fma(c, d, a*b)
. Moreover it affects the expression evaluation model in a pretty awful way: normally we know that a*b + c*d
computes a*b
and c*d
first and then applies +
to that; with automatic FMA there's no longer any certainty that a*b
or c*d
will be evaluated by themselves as expressions. Any expression that produces a *
might get merged into a surrounding expression that does a +
with that result.
fma
is great in 95% of situations. But it does break some important invariants, such as a*b - a*b == 0
. This sort of expression arises in x*x'
where x::ComplexF64
, for example (one can argue for abs2
in the scalar case, but not if we instead have x::Matrix{ComplexF64}
). People would open issues for these, so we'd better be happy with our choice and justification.
We already have muladd
and @fastmath
for enabling these transformations. I'd rather they remain opt-in rather than opt-out.
FastFloats.jl? [..] what kind of monster doesn't want fast floats?
One way to maybe solve this is, doing the opposite, excise current Float types to a package SlowFloats.jl... (or CompatibiltyFloats.jl better name?), as e.g. SFloat64, that needs to respect the old rules. We would default all types, and all literals e.g. 1.0f0 to the new type keeping the old names (maybe let literals be the old types to "poison" calculations?).
I was thinking the old/current types are IEEE but actually old and new are? Recent update requires FMA capability I believe, but does it forbid its use by default? There could be a global option for the new types if it comes to it being really needed, but what's the best way to opt out, with types and/or global flag and/or a macro?
I think Posist default to FMA use, rather than just allow FMA (and demand it supported): https://posithub.org/docs/posit_standard-2.pdf
A fused expression is an expression with two or more operations that is evaluated exactly before rounding to a posit value. [..] Fused expressions (such as fused multiply-add and fused multiply-subtract) need not be performed using quire representations to be posit compliant.
please don't make this default for base Float
types. I think it will make it harder to predict what the code is going to do and possibly more volatile depending on how / what the compiler decides to optimize. if a user wants fma
it's well within reach. if discoverability is the main reason for this issue, then I think a better solution is to add it as a performance tip.
In general, Julia is very conservative with floating point optimization. It doesn't allow replacing
x/y
withx*(1/y)
or re-ordering operations likex+y+z
withy+z+x
. I think this is good, and it makes it much easier to write accurate code, as these re-orderings often will lose precision. Replacinga*b+c
withmuladd(a,b,c)
however, is almost always a performance win, and is faster except in the case where a user is writing their own compensated arithmetic and depends on the specific rounding errors.As such, I think it would be beneficial for Julia to allow LLVM to replace
a*b+c
withmuladd(a,b,c)
unless the user opts-out with a@no_fma
macro. This will result in a performance and accuracy benefit to most users (who don't do lots of numerical analysis), provide an easy escape hatch to people writing compensated arithmetic libraries (that will also make that code more clear, since there will be an annotation whenever tricky rounding math is being done).The 2 biggest downsides of this proposal are that it's breaking, and that it would reduce consistency between machines with and without fma hardware. The first downside is why I think this needs to wait until Julia 2.0. I don't think the second problem is that serious, since most computers have fma hardware, and I don't think it's worth giving lower accuracy and speed for 90% of computers to keep consistency with the ones that can't easily provide it.