exp, @fastmath, SVML vectorization.

DrTodd13 commented 7 years ago

In Julia 0.6, I noticed that exp is no longer a call to libm but has been implemented in Julia itself. I wonder if this decision has potential performance implications not far down the road. Through SVML, LLVM is able to provide vectorization for exp, if exp is invoked through a SVML intrinsic or a call into libm. It won't vectorize if the expanded LLVM from Julia's exp implementation is included. We can use fastmath to revert to a call to libm but this raises the question of the semantics of fastmath. It seems like the semantics of fastmath should be a loss of accuracy in exchange for performance. My understanding is that this is indeed the behavior for Julia fastmath adds in that Julia will use the LLVM fastmath flag. I also believe that currently Julia fastmath exp is not consistent in that it does not signal a lower accuracy version and so we would expect Julia's exp to have the same accuracy/performance as fastmath/libm exp?

I have been told that SVML provides three points within the accuracy/performance tradeoff space. We can debate but it seems like fastmath should map to one of the lower two accuracy (higher performance) levels. The question then becomes, how do you get vectorization at the highest accuracy level with SVML? It seems that implementing exp in Julia precludes this possibility unless more code is added to detect potential vectorization with SVML and in that case to revert back to a libm call. Why not just always libm then? In what circumstance is Julia exp superior?

yuyichao commented 7 years ago

Through SVML, LLVM is able to provide vectorization for exp, if exp is invoked through a SVML intrinsic or a call into libm.

Have you actually seen this happen? I don't think we have ever lowered it in any way that llvm can recognize.

Keno commented 7 years ago

Yes, we don't lower exp in a way that LLVM can recognize at the moment. However, it should be fairly simple to add a generic hook to fix that. As you noted, manually calling exp or the llvm intrinsic will work for testing purposes. The julia native implementation is generally faster than libm. In any case, there's no rush on this since we can't drop in SVML at the moment anyway.

JeffBezanson commented 7 years ago

I also believe that currently Julia fastmath exp is not consistent in that it does not signal a lower accuracy version

It seems intuitive to me that fastmath would allow calling a lower-accuracy version but not require it. In this case I believe the intent of the fastmath version was to skip error checks.

I don't think we have ever lowered it in any way that llvm can recognize.

This doesn't really matter --- we could implement exp in such a way that llvm could recognize it, but that would mean skipping the julia implementation, so the point still stands.

yuyichao commented 7 years ago

We can use fastmath to revert to a call to libm

This is most likely an oversight that should be fixed. In fact, the fast math version seems slower.

This doesn't really matter

My point being there should be no regression caused by this change in 0.6. Being able to vectorize it would obviously be even better.

that would mean skipping the julia implementation

Hopefully no since that means a failure to vectorize will create slower code....

Keno commented 7 years ago

There's no problem with just telling LLVM that our exp function is the same as what it considers exp to be. Just one extra hook in TargetLibraryInfo. Even better we could come up with a generic way of annotating This function is a vectorized version of this other function.

yuyichao commented 7 years ago

That'll be cool. How hard would it be to tell LLVM that a julia function can be vectorized (either because there's no complex control flow in it or because we defined a version that operate on NTuple{...,VecElement{...}} directly)?

Keno commented 7 years ago

The hooks are already there in TargetLibraryInfo as I said, but may require some hacking to have it do anything other than what is hardcoded right now. For functions without complex control flow, LLVM should be able to figure out by itself that the function can be vectorized, so we should just fix that in LLVM.

anton-malakhov commented 7 years ago

We are working on experimental patch to LLVM 4.0 which enables vectorization for all the SVML functions. Here is the list of enabled functions: sin cos pow exp log acos acosh asin asinh atan2 atan atanh cbrt cdfnorm cdfnorminv ceil cosd cosh erf erfc erfcinv erfinv exp10 exp2 expm1 floor fmod hypot invsqrt log10 log1p log2 logb nearbyint rint round sind sinh sqrt tan tanh trunc Moreover, Intel will soon provide you a license to redistribute SVML the same way as your redistribute MKL in your binary Julia Distribution. It would be very cool if we can enable vectorization of these functions not only in the fastmath mode, SVML provides HA functions for high accuracy as well.

StefanKarpinski commented 7 years ago

We can't actually legally distribute Julia with MKL unless Julia is built without any GPL libraries, which is not a standard build setup, so we won't be able to ship with SVML either. If we get rid of all the GPL libraries from Base Julia (which is a long term goal) then we'll be able to ship with MKL and SVML.

Keno commented 7 years ago

Of course if Intel wanted to open source SVML under a GPL-compatible license that'd be great (and we could start using it immediately).

StefanKarpinski commented 7 years ago

Ditto with MKL 😀

anton-malakhov commented 7 years ago

While we are considering open-sourceing SVML (though it might be still limitted and takes time to release).. it's quite unlikely to happen for MKL

simonbyrne commented 7 years ago

I think this is a duplicate of #15265.

anton-malakhov commented 7 years ago

@StefanKarpinski @Keno, Viral assured us that "We expect that JuliaPro will start shipping with mkl by juliacon." Thus our question is w.r.t. integration of SVML into MKL build of Julia Pro distro is still valid and urgent enough.

Keno commented 7 years ago

As I said SVML is not currently integrable into Julia for technical reasons. Intel NDA prevents me from giving details in this forum. Feel free to email me.

RoyiAvital commented 6 years ago

Is there an update to having SVML under Julia?

It seems to be holding back Julia (At part of the reason) in the following test:

https://www.modelsandrisk.org/appendix/speed/

Though Python + Numba is still faster when Julia is using @inbounds and Apple libm (See https://julialang.slack.com/archives/C67910KEH/p1531490464000597?thread_ts=1531475750.000264&cid=C67910KEH).

StefanKarpinski commented 6 years ago

No.

simonbyrne commented 6 years ago

In the long run, it would be neat to have something like ISPC (https://ispc.github.io/) in Julia itself.

RoyiAvital commented 6 years ago

@simonbyrne , Using SVML and ispc like approach are complementary of each other, aren't they? Not that I'm an expert on that but I would assume integrating SVML is easier especially when Intel offers assistance.

Keno commented 6 years ago

They are complimentary, but properly integrating SVML requires julia at the frontend level to be aware of vector lanes, which we currently don't have, but would be a prerequisite for exposing a general spmd programming model.

anton-malakhov commented 6 years ago

@Keno, numba is not aware of vector lanes, still thanks to ability of LLVM to recognize libm calls and transform them into svml calls along allows it to enjoy nice speedups on transcendental functions. I know that Julia goes away from libm calls.. but can you consider having a mixed or opt-in approach to enable them back? E.g. we can start with some` @fastmath(SVML)-like macro which will enable good old libm functions emitting and switch LLVM into SVML mode?

Keno commented 6 years ago

Sure, that's why I said "properly integrating". A plethora of other hacks are and have always been possible. E.g. we used SVML for Celeste just fine.

JuliaLang / julia

exp, @fastmath, SVML vectorization. #21454