Closed ischoegl closed 2 years ago
@ischoegl and @tpg2114, thanks for digging in to this! Does the -ffast-math
actually improve performance for standard workloads that we run? If it does, then this is probably worth digging in to. Otherwise, it might be best to leave the code as-is.
Does the
-ffast-math
actually improve performance for standard workloads that we run?
Based on my understanding, yes. But as IEEE 754 is not enforced (which is where performance gains come from), this may make for some 'fun' refactoring. I mainly created this issue to make sure that users are aware of the limitation. It's somewhat puzzling that icx
enables -fp-model fast
by default. FWIW, here's a blog post that I found informative and a SO answer, but I didn't dig very deep.
Based on my understanding, yes.
I'm not asking you to do this, but if anyone decides to pursue this issue, it'd be nice to have concrete benchmarks that adding these flags does improve real-world performance that justifies the (presumably) increased code complexity to handle this case.
I'm not asking you to do this …
no worries. I’ll pass on this one 😜
I investigated this a bit, using Clang++ on Ubuntu 20.04. The only flag that's part of -ffast-math
that is a significant problem is -ffinite-math-only
. With the rest of the flags enabled by specifying optimize_flags=
-O3 -ffast-math -fno-finite-math-only`, I get 4 test failures:
Where the first two at least are just an issue of assertEquals
being used where assertNear
would be a better choice, and I think the latter two can be resolved without too much difficulty.
Enabling -ffinite-math-only
requires eliminating any check that relies on isnan
working or even the comparison nan != <some number>
returning false. I made some really crude changes to do this in this commit on my fork: https://github.com/speth/cantera/commit/5abdaa8e673827453e5a5d7de6ce4b26f202481f, and did some benchmarking of an ignition delay problem using a couple different mechanisms. For these tests, I used Eigen for the linear algebra and the vendored copy of Sundials, so this is about the maximum impact that these flags can have, since there is very little outside code. I tested both GRI 3.0 and a larger mechanism with ~400 species. What I found was:
-O3
, using -O3 -ffast-math -fno-finite-math-only
) is about 2% faster in terms of time steps per second.-O3 -ffast-math
is about 4% faster in terms of time steps per secondGiven the unsatisfactory nature of the changes required to support using -ffinite-math-only
and the relatively small performance gains, my recommendation is that we add a configuration-time check for whether isnan
works correctly and if not, abort compilation with an error message stating that Cantera doesn't work with this flag.
Thanks for looking into this further, @speth! The two non-python tests look familiar, and I agree that assertEquals
is easy to fix (should be done regardless).
As an aside, one thing I noticed in my own tests was a plethora of warnings coming out of fmt
via AnyMap
. In addition, there were also several 'legacy' fmt
use cases that caused annoying warnings. Fixing what causes these fmt
warnings may be the largest issue on hand, as I don't really like the nuclear option that disables these warnings.
Overall, I agree that it probably doesn't make sense to make this a priority. At the same time, it probably also makes sense to avoid using NaN
as a sentinel value (which I have recently used), and to avoid exact / bit-wise equality checks in new code. It should be relatively simple to replace the NaN
checks in the new reaction rate evaluators, which probably should happen prior to 2.6. In other words, what I'm arguing for is not to make it a priority to fix, but also not to make it harder to fix going forward.
I don't see any warnings related to fmt
with either GCC or Clang, so I guess that's specific to the Intel compiler.
I would argue that most of the ways that we use NaN in Cantera are very reasonable, and that the alternatives are often worse. For instance, in the modification I made to the CachedValue
class to run with -ffinite-math
, the initial cache check value now has to be some arbitrary but finite number. And while it's unlikely that the a cached value would be checked against this initial value and return an erroneous result, it's not impossible.
I don't see any warnings related to
fmt
with either GCC or Clang, so I guess that's specific to the Intel compiler.
That sounds plausible.
I would argue that most of the ways that we use NaN in Cantera are very reasonable, and that the alternatives are often worse.
I tend to agree: using NaN
as a sentinel is efficient from a coding perspective. But apparently we're forcing the code to check for this possibility, which is less ideal from a computational perspective (although the penalty appears to be small). I also don't think that we need to avoid NaN as an output value, it's just that there probably need to be internal booleans that replace the checks. Overall, I don't think that there's any urgency to 'fix' this issue ... although it probably needs to remain open.
@speth @bryanwweber @ischoegl Although the issue has been closed, I am reporting some additional data: I ran a small test program benchmarking the computation of reaction rates with 16 different compilers/versions, each with and without fast-math
. Quick summary: using fast-math
, g++ becomes about 15 % faster and the Intel compilers less than 5 % faster. The relative accuracy in my test is within 10^-10 %. However, even without fast-math
, final results for reaction rates between g++/clang++ and icpc/icpx are slightly different, which I cannot explain at the moment. For all details, see here:
https://github.com/g3bk47/CanteraCompilerPerformance
Let me know if I should benchmark any other code snippets with this setup.
@g3bk47 … thanks for those results! Your comparison suite is impressive. From my perspective, a performance gain of up to 15% would be worth pursuing. While segfaults and erroneous results are fixed here (I.e. those issues are addressed), I believe this warrants an enhancement request?
Problem description
As reported in #1150, compilation with default optimization options for the Intel compiler suite
icx
/icpx
results in incorrect results. The behavior can be reproduced forgcc
with the-ffinite-math-only
option (which is one of the-ffast-math
flags), e.g. for a vanillagcc
toolchain on Ubuntu 20.04:The option breaks strict IEEE compliance, as it does "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs." There are some instances where Cantera uses
NaN
internally which presumably breaks 'fast math' optimizations.Steps to reproduce
gcc
with fast math (same as foricx
in #1150) ...Behavior
System information
gcc
/icx
Additional context
{fmt}
output, which currently produces numerous warnings related toNaN
compliance.NaN
can be implemented, see SO