Closed zhichen3 closed 2 months ago
@psharda this could make chemistry a lot faster...
Ref 1 says:
Versions of EXP that use 8-byte (long long) integers do not suffer from this staircase effect, but were found to be unacceptably slow on the typical workstation platforms.
I wonder if that's still true with today's machines?
Ref 1 says:
Versions of EXP that use 8-byte (long long) integers do not suffer from this staircase effect, but were found to be unacceptably slow on the typical workstation platforms.
I wonder if that's still true with today's machines?
Not only that. I think I might actually need this long long int version (hopefully still faster than std::exp) if it can really get rid of this staircase effect. When I tried to use fast_exp
in NSE solve, I had to increase the tolerance for the NSE solve to ~1.e-6 to 1.e-7 to be able in order to solve it successfully. And I think the main issue is the staircase effect, which happens right at $\Delta x$ ~ 1.e-6.
I also tried to just work with higher tolerances, but it doesn't give good results in detonation. I wonder if its because the algorithm is so sensitive to getting the correct NSE massfractions...
There's a 64-bit version here, if you want to try it: https://gist.github.com/jrade/293a73f89dfef51da6522428c857802d
It also uses memcpy
to avoid the UB from type punning through a union
. We should probably do that too, seeing as we've had issues with reinterpret_cast
in ROCm before.
It also uses
memcpy
to avoid the UB from type punning through aunion
. We should probably do that too, seeing as we've had issues withreinterpret_cast
in ROCm before.
I think you can do it directly by just shifting the it by 4 byte. So its just a single long long int but instead doing 2^20 do 2^52 when calculating the constants
I find good references here: Some useful links: https://stackoverflow.com/questions/53882855/simple-explanation-of-the-ankerl-fast-exponent-algorithm https://github.com/ekmett/approximate/blob/master/cbits/fast.c https://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/ https://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/
I just followed: https://stackoverflow.com/questions/53882855/simple-explanation-of-the-ankerl-fast-exponent-algorithm
might be worth it to implement those fast log and fast pow as well.
int64 version does got rid of the stair effect and its actually faster than int32 version.
It also uses
memcpy
to avoid the UB from type punning through aunion
. We should probably do that too, seeing as we've had issues withreinterpret_cast
in ROCm before.
ah okay.
For whatever reason, it doesn't work well with NSE solve. Theres not an issue of solving using a low tolerance, but it would slow things down and it ruins convergence rate. So I'm leaving it out for now.
this looks good to me. @yut23 are you okay as well?
@zhichen3 can you update the comment in the first cell to reflect the current state of things so we can have a good / accurate merge message
updated, I assume you meant the cell on the top of the page.
A fast exp algorithm implementation following: Ref:
There are two versions: jrade_exp: Its roughly 6-times faster than std::exp. It has reasonable accuracy across all range (roughly ~2% relative error), we follow Ref (3)
ekmett_exp: It is about twice as slow as jrade_exp, but still ~ 3 times faster than std::exp, but it gives much better accuracy, ~0.1% relative error for most cases.
There are float and double versions for both versions.
The main driver is fast_exp, which also included a simple taylor approximation, 1+x, when x < 0.1, currently it is defaulted to use jrade_exp but one can just switch to ekmett_exp for better accuracy.
We also use memcpy approach to avoid undefined behavior from type-punning when using union (pointed out by Eric), and this approach is demonstrated in Ref 3, which we adapt for ekmett_exp.
Sidenote: when using it with NSE solve, it makes the convergence rate go bad in nse_test. Maybe its because of the insufficient accuracy?