Closed quantheory closed 5 months ago
Nice. Looks like the most noticeable impact would come from avoiding trascendental math functions calls. Also, if we allow considering that mu_r=1
, you could save another log in
logn0r.set(qr_gt_small, log10(cdistr) + (mu_r + 1) * log10(lamr));
by doing
logn0r.set(qr_gt_small, log10(cdistr*lamr*lamr));
@bartgol Yep, you're right, although I believe mu_r
is always set to 0
currently, so that would be:
logn0r.set(qr_gt_small, log10(cdistr*lamr));
Though I'm not fond of mixing code that uses mu_r
as an input with code that assumes that mu_r
has a particular value. It makes it unclear whether mu_r
is supposed to be adjustable or not.
If we wanted to only support mu_r == 0
going forward, it would probably be best to simply remove it entirely, e.g. also replacing (mu_r + 1) * (mu_r + 2) * (mu_r + 3)
with just the number 6
, and similar measures taken wherever mu_r
appears.
I see static constexpr Scalar mu_r_const = 1.0;
in the physics constants file. But whatever.
I agree that if something is meant to be tunable or somewhat changed, we should not "compile it away" in the code, so perhaps my previous comment should be disregarded.
@bartgol Ah, thanks for letting me know. Now that I've double-checked, I realize that EAMv3 has retuned this value to 0, but SCREAM is using 1, which is probably the source of my confusion.
So, @ambrad pointed me to #1722 the other day, and I realized that a lot of the cost of the P3 rain sedimentation is probably in the many
get_rain_dsd2
calls in the rain fall velocity calculations (which is a prerequisite for the lookup table stuff). There's a lot of inefficient stuff in there!First, note these lines: https://github.com/E3SM-Project/scream/blob/f412870ca05da69fa26698aa22bb47b4fc24ef20/components/eamxx/src/physics/p3/impl/p3_dsd2_impl.hpp#L100-L107 In this case,
inv_dum
doesn't actually seem to be used in the current version, so I'm hoping that the compiler is able to optimize out its calculation completely. But also, sincemu_r
is currently treated as a constant,inv_dum
is just a constant divided bylamr
. So calculating the cube root twice would be unnecessary anyway ifcbrt((mu_r + 3) * (mu_r + 2) * (mu_r + 1) / 6.)
was calculated once at initialization, and the value saved for all further calculations.Second, just below that: https://github.com/E3SM-Project/scream/blob/f412870ca05da69fa26698aa22bb47b4fc24ef20/components/eamxx/src/physics/p3/impl/p3_dsd2_impl.hpp#L123-L127 This looks like it is just a translation from the Fortran version of P3, but the original P3 code itself is very weird. This code is calculating the mathematical equivalent of:
That's it; none of the
exp
,log
, ortgamma
calls are necessary here. Also, you can see that the denominator is an expression that has already been used above, so if you really wanted to optimize, that could be saved and reused.Third, right below that, you have: https://github.com/E3SM-Project/scream/blob/f412870ca05da69fa26698aa22bb47b4fc24ef20/components/eamxx/src/physics/p3/impl/p3_dsd2_impl.hpp#L130-L132 If
mu_r
is a constant, thentgamma(mu_r + 1.)
can be calculated at initialization and saved, thus removingtgamma
from this performance-critical code. Furthermore, the second line can be reduced to:This reduces the number of
log10
calls by one. Even better, the calculation of these variables could be split out into a separate function fromget_rain_dsd2
entirely, since I do not believecdistr
orlogn0r
are needed in the sedimentation loop at all.I am not planning to make any of these changes myself, at least right now. However, since the above optimizations should collectively remove all of the transcendental function calls from the performance-critical sedimentation loop (leaving only a single cube root and a few arithmetic operations), it might be worth it for someone to try to implement these changes, and see if there are any similar improvements are to be found in the ice code.
(Of course these optimizations also are, or will be, in the modified P3 code we're developing as part of our SciDAC project.)