Open mppf opened 6 years ago
On my system, C backend+GCC without --no-ieee-float -> 72s; with --no-ieee-float -> 15s. The LLVM version is about 17s.
The customized version of cabs from PR #7070 does not appear to be necessary anymore, at least not with LLVM 6 or 8.
I think this might be related: https://medium.com/@smcallis_71148/complex-arithmetic-is-complicated-873ec0c69fc5
With -ffast-math, complex multiplies and abs calls with GCC can use the naive algorithm, but with clang, the complex multiply includes conditionals.
This makes me wonder: For cases where we've supplied our own real * complex
or imag * complex
overloads (or other similar operations), as @damianmoz was asking about over on issue #11941, are we being overly cavalier and should we be doing more work to handle NaN / infinity cases more properly? (similar to the "full of conditionals" gcc code in the article Michael referenced).
@bradcray - I don't think that there's any challenge to real complex or imag complex in this setting, because the nan-ness / inf-ness will propagate. In fact the C standard defines these things (in the informative Annex G) and doesn't suggest similar conditionals to the complex * complex case.
I was hoping that might be the case, but was worried that maybe I was being naive. E.g., I wasn't certain whether, if the real op complex.real
portion became NaN (say) whether, semantically, that should also be expected to affect the real op complex.imag
component if it did not become NaN (say).
Or maybe put another way: In the same way that the article's author said s/he'd just used the FOIL method from school, that's the way I approached any operations on complexes that I implemented in module code, so was curious whether I'd introduced other naive assumptions.
I also want to emphasize the "or other similar operations" in my question in case despite multiplication being easy some other case is not (like division or exponentiation, say, though with a quick glance it looks like we don't support mixed complex / float exponentiation overloads...).
Note, I just tried -fcx-limited-range with GCC on mandelbrot-complex and it's much slower than with -ffast-math . I think it'd be worth understanding what about -ffast-math is providing the performance here.
@mppf, at the risk of oversimplifying things, the -fcx-limited-range I think does appropriate (whatever that means) scaling of the numerator by (I would guess) something like an exact power of 2 that keeps the denominator in a nice range when summing the squares of the real and imaginary parts of the denominator. The conditionals to extract the scale factor and the scaling will add significant overhead. I do not use that option because I would hope that I pay careful enough attention to when I need a complex division that I handle this scenario myself. And if I am (regularly?) stupid enough to forget, it is my own fault. But then you need such a capability/option to protect people from themselves and stay strict. But if you are using C++ complex numbers, isn't something that gets handled by the back end unless I am mistaken about how the Chapel compiler works (which is always possible).
@damianmoz - the benchmark is doing complex multiplication, not division. We're using C99 complex numbers which are different from C++ complex numbers (but they definitely have some similarities).
But if you are using C++ complex numbers, isn't something that gets handled by the back end unless I am mistaken about how the Chapel compiler works (which is always possible).
Even if GCC or LLVM optimizations handle it, I need to understand what is key to the performance here, since it's not always working optimally. In particular it seems to require --no-ieee-float / -ffast-math, which is a bit unfortunate, and we still have the problem that --llvm is slower than with the C backend (and we don't know why).
I meant C99 complex numbers. Sorry. Its Friday here and my brain must be in weekend mode.
When I looked at the mandelbrot-complex benchmark in the jacobnelson directory, I could not see any division either. So I figured I was not answering your question.
It is not a simple problem obviously. What if you recode this in C99? Do you see similar issues or am I totally missing the point?
The LLVM backend is ~40% slower for the mandelbrot-complex benchmark:
https://chapel-lang.org/perf/chapcs/llvm/?startdate=2018/07/12&graphs=mandelbrotall,mandelbrotvariations
it takes about 25 s vs 18 s with the C backend.
This spike is to investigate the reason for this performance difference.
PR #7070 might be related.