JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.91k stars 5.49k forks source link

Automatic FMA for Julia 2.0 #40139

Open oscardssmith opened 3 years ago

oscardssmith commented 3 years ago

In general, Julia is very conservative with floating point optimization. It doesn't allow replacing x/y with x*(1/y) or re-ordering operations like x+y+z with y+z+x. I think this is good, and it makes it much easier to write accurate code, as these re-orderings often will lose precision. Replacing a*b+c with muladd(a,b,c) however, is almost always a performance win, and is faster except in the case where a user is writing their own compensated arithmetic and depends on the specific rounding errors.

As such, I think it would be beneficial for Julia to allow LLVM to replace a*b+c with muladd(a,b,c) unless the user opts-out with a @no_fma macro. This will result in a performance and accuracy benefit to most users (who don't do lots of numerical analysis), provide an easy escape hatch to people writing compensated arithmetic libraries (that will also make that code more clear, since there will be an annotation whenever tricky rounding math is being done).

The 2 biggest downsides of this proposal are that it's breaking, and that it would reduce consistency between machines with and without fma hardware. The first downside is why I think this needs to wait until Julia 2.0. I don't think the second problem is that serious, since most computers have fma hardware, and I don't think it's worth giving lower accuracy and speed for 90% of computers to keep consistency with the ones that can't easily provide it.

chriselrod commented 3 years ago

provide an easy escape hatch to people writing compensated arithmetic libraries

And for people being bitten by: https://github.com/vchuravy/GPUifyLoops.jl/issues/91

DilumAluthge commented 3 years ago

provide an easy escape hatch to people writing compensated arithmetic libraries

And for people being bitten by: vchuravy/GPUifyLoops.jl#91

And https://github.com/JuliaGPU/KernelAbstractions.jl/issues/20, which is the successor issue.

vchuravy commented 3 years ago

( I feel seen )

But yes much more precise reasoning about fastmath optimization would be welcome. As the issue show something as harmless as fpcontract can have counter-intuitive results.

stevengj commented 3 years ago

Is there any compiler for any language that does FMA contraction by default, without an explicit flag of some sort?

vchuravy commented 3 years ago

IIRC nvcc does this for CUDA device code, which is why KernelAbstractions mimics that.

On Mon, Mar 22, 2021 at 4:50 PM Steven G. Johnson @.***> wrote:

Is there any compiler for any language that does FMA contraction by default, without an explicit flag of some sort?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/julia/issues/40139#issuecomment-804385241, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDO2VG6TPS3RSI6SKBPVDTE6USLANCNFSM4ZTPFN5Q .

chriselrod commented 3 years ago

Is there any compiler for any language that does FMA contraction by default, without an explicit flag of some sort?

GCC does this with -O2.

Keno commented 3 years ago

I thought I'd commented here, but I guess I never hit the button. IMO, we should have separate float types with various optimization guarantees. That could be done in a package (with maybe some non-breaking changes in base to expose the flags in the intrinsic) and then we can decide in 2.0 if we want to change the default float type or not.

oscardssmith commented 3 years ago

That's a really interesting way of solving this. The one thing I'd be worried about is that the biggest beneficiaries of this idea are probably people who don't even know what Floating Point is (or who don't deeply understand numerical analysis). I'm not sure that a package that requires users opt in to a special numeric type will show the benefits of a system designed to reduce the amount of manual annotation.

Keno commented 3 years ago

The package is just a way to prototype and see if we like the tradeoffs. We can decide in 2.0 what the behavior of floats should be.

StefanKarpinski commented 3 years ago

FastFloats.jl? We'll really have painted ourselves into a corner — what kind of monster doesn't want fast floats?

PallHaraldsson commented 3 years ago

While doing maybe FastFloats.jl first to prototype, is this a case of, "technically breaking" and we should just do it pre-2.0? We can always do SlowFloats.jl "what kind of monster doesn't want" that? Don't basically all supported platforms support FMA, so this would only be a documented change across versions, and people can stay with non-FMA LTS version?

StefanKarpinski commented 3 years ago

The issue for me isn't worrying about platforms that don't have FMA, it's that there isn't a single canonical way to apply FMA to operations. Currently Julia just does exactly what you ask it to do. Unless you use a macro to tell it "actually, do something else besides what I asked you to do". If we start doing automatic FMAs, then what does an expression like a*b + c*d actually compute? For Float64 values to be specific. Right now it's completely well-defined. If we introduce automatic FMA optimizations, then it might compute fma(a, b, c*d) or fma(c, d, a*b). Moreover it affects the expression evaluation model in a pretty awful way: normally we know that a*b + c*d computes a*b and c*d first and then applies + to that; with automatic FMA there's no longer any certainty that a*b or c*d will be evaluated by themselves as expressions. Any expression that produces a * might get merged into a surrounding expression that does a + with that result.

mikmoore commented 1 year ago

fma is great in 95% of situations. But it does break some important invariants, such as a*b - a*b == 0. This sort of expression arises in x*x' where x::ComplexF64, for example (one can argue for abs2 in the scalar case, but not if we instead have x::Matrix{ComplexF64}). People would open issues for these, so we'd better be happy with our choice and justification.

We already have muladd and @fastmath for enabling these transformations. I'd rather they remain opt-in rather than opt-out.

PallHaraldsson commented 1 year ago

FastFloats.jl? [..] what kind of monster doesn't want fast floats?

One way to maybe solve this is, doing the opposite, excise current Float types to a package SlowFloats.jl... (or CompatibiltyFloats.jl better name?), as e.g. SFloat64, that needs to respect the old rules. We would default all types, and all literals e.g. 1.0f0 to the new type keeping the old names (maybe let literals be the old types to "poison" calculations?).

I was thinking the old/current types are IEEE but actually old and new are? Recent update requires FMA capability I believe, but does it forbid its use by default? There could be a global option for the new types if it comes to it being really needed, but what's the best way to opt out, with types and/or global flag and/or a macro?

I think Posist default to FMA use, rather than just allow FMA (and demand it supported): https://posithub.org/docs/posit_standard-2.pdf

A fused expression is an expression with two or more operations that is evaluated exactly before rounding to a posit value. [..] Fused expressions (such as fused multiply-add and fused multiply-subtract) need not be performed using quire representations to be posit compliant.

adienes commented 1 year ago

please don't make this default for base Float types. I think it will make it harder to predict what the code is going to do and possibly more volatile depending on how / what the compiler decides to optimize. if a user wants fma it's well within reach. if discoverability is the main reason for this issue, then I think a better solution is to add it as a performance tip.