Updated primordial chemistry actual_rhs.H to replace std::pow with a combo of std::exp and std::log, added free-free cooling

AMReX-Astro / Microphysics

common astrophysical microphysics routines with interfaces for the different AMReX codes

https://amrex-astro.github.io/Microphysics

Other

34 stars 33 forks source link

Updated primordial chemistry actual_rhs.H to replace std::pow with a combo of std::exp and std::log, added free-free cooling #1605

Closed psharda closed 2 months ago

psharda commented 2 months ago

I replaced most calls to std::pow with a combination of std::exp and std::log to speedup chemistry. Also added free-free cooling, which was missing from the previous version. In the future, I will follow #1586 and #1591 to use fast_exp and fast_log.

The unit test passes, with small differences in the number densities of H and H2 as compared to the reference solution (3.5% and 0.3% respectively). The difference in H is more likely due to free-free cooling.

psharda commented 2 months ago

Numbers below are for a test PopIII simulation in 3D on moth GPU @ ANU (AMD). The test ran for 1000 steps, starting with 0 AMR levels and finishing with 1 AMR level on top of the base level.

Test A: Current version of actual_rhs.H Test B: Current version + free-free cooling Test C: Test B but with the new version of actual_rhs.H implemented in this PR.

Setup      |  Test time (seconds)  | Mupdates/second | % chemistry  
Test A     |  1202.919                     | 1.504                      | 76.72%
Test B     |  1206.848                    | 1.499                      | 76.55%
Test C     |  1143.627                     | 1.582                      | 75.61%

So the new version is faster by ~23 seconds. This time difference is likely to be larger for later times, when the density increases, chemistry takes longer, and more AMR levels are added.

BenWibking commented 2 months ago

Ok, so the chemistry itself is 7% faster on AMD with these improvements.

I am a bit puzzled as to why it's not faster. Maybe there is an underlying performance issue with the AMD compiler. Do you have performance numbers for NVIDIA, or on CPU?

BenWibking commented 2 months ago

Did you have Sympy simplify the expressions after doing the pow -> exp/log conversion?

psharda commented 2 months ago

Did you have Sympy simplify the expressions after doing the pow -> exp/log conversion?

@BenWibking I do CSE first, then cxxcode everything. The second step does the pow --> exp/log conversion.

yut23 commented 2 months ago

The real performance improvements should come from merging products of exponentials to a single std::exp() call and reusing the log computations between terms. Is it possible to do more CSE after transforming pow to exp/log?

BenWibking commented 2 months ago

To expand a bit further: the compiler won't (in general) combine exp(a)*exp(b) into exp(a+b) because the rounding properties are different, even though the latter form should be twice as fast.

It looks like you can do the above simplification with sympy.powsimp: https://docs.sympy.org/latest/tutorials/intro-tutorial/simplification.html#powsimp

yut23 commented 2 months ago

I don't think the compiler knows (or is even allowed to assume) anything about the mathematical properties of exp() and log(). It also looks like the standard library can't mark many of the math functions with __attribute__(const) to let the compiler do CSE, as the global floating-point rounding mode can change their results.