[Performance] Using std::pow(x, n) versus x * x * ... * x

cgyurgyik commented 4 years ago

Looking at: EXOSIMS/EXOSIMS/util/KeplerSTM_C/KeplerSTM_C.c

I saw:
double r0norm = sqrt(pow(r0[0], 2)+pow(r0[1], 2)+pow(r0[2], 2)); and similar cases where std::pow(x, n) is used, and n is small. Is there a specific reason for this?

See: "When to Use std::pow" for reasons not to use std::pow when n is small.

There's also a few other updates that could be made using C++ algorithms/idioms, as well as making the code "safer" and more readable. Not sure if that interests you at all.

cgyurgyik commented 4 years ago

To give you an idea, here's an attempt to modernize some of the C code in KeplerSTM_C. I am a big fan of using provided algorithms in the C++ library <algorithm> unless benchmarks say otherwise. Using const helps compiler efficiency, as well as ensuring you don't change certain variables. std::array is safer than C arrays.

// NOTE: NOT VERIFIED FOR CORRECTNESS, NOR HAS IT BEEN TESTED.
int KeplerSTM_C (const std::array<double, 6>& x0, double dt, double mu, 
const std::array<double, 6>& x1, double epsmult) {

    /* Initialize orbit values*/
    const std::array<double, 3> r0 = {x0[0], x0[1], x0[2]};
    const std::array<double, 3> v0 = {x0[3], x0[4], x0[5]};

    const double dot_r0 = std::inner_product(std::cbegin(r0), std::cend(r0), std::cbegin(r0), std::cend(r0), 0.0);
    const double r0norm = std::sqrt(dot_r0);
    const double nu0 = std::inner_product(std::cbegin(r0), std::cend(r0), std::cbegin(v0), std::cend(v0), 0.0);
    const double dot_v0 = std::inner_product(std::cbegin(v0), std::cend(v0), std::cbegin(v0), std::cend(v0), 0.0);
    const double beta = 2.0 * mu / r0norm - dot_v0;

    /* For elliptic orbits, account for period effects */
    double deltaU = 0;
    if (beta > 0){
        const double P = 2.0 * M_PI * mu * std::pow(beta,(-1.5));
        const double norb = std::floor((dt + P / 2.0 - 2.0 * nu0 / beta) / P);
        deltaU = 2.0 * M_PI * norb * std::pow(beta,(-2.5));
    }

/*...*/

If you give me some reasonable tests or function parameters for KeplerSTM_C that hit all use cases, I would be happy to run some benchmarks as well.

dsavransky commented 4 years ago

You are welcome to submit a pull request with an associated benchmark showing the performance improvement.

cgyurgyik commented 4 years ago

Sure. I don't think it would be sufficient to benchmark on random data. Do you have any provided test data for these functions?

dsavransky commented 4 years ago

See propag_system in Prototypes/SimulatedUniverse. Relevant scales are up to 1 million propagations at a time.

deanthedream commented 4 years ago

*We only propagate 1 system at a time. Propagation only occurs when making a revisit of a target star. We make at most 1,400 single-visit observations in a mission simulation currently. propag_system is only used in ~10 places in EXOSIMS. This could improve dynamic completeness computation time I think and be generally helpful for revisits (Very helpful for this reason alone. Do it!). @CoreySpohn had some dynamic completeness improvements as well

cgyurgyik commented 4 years ago

I am not sure what "dynamic completeness" entails. Here are initial benchmarks using some more modern C++ and other quick fixes. The code still has a long ways to go in terms of readability / removing code smells as well, and there are probably other fixes that can improve the speed.

NOTE: This is using arbitrary numbers:

 std::array<double, 6> x0 = {1.0,1.0,1.0,1.0,1.0,1.0};
    const double dt = 3.0;
    const double mu = 5.0;
    std::array<double, 6> x1  = {2.0,2.0,2.0,2.0,2.0,2.0};
    const double epsmult = 4.0;
    const int convergence_limit = 1000;

If you want more interesting arguments, please provide some. I am far from an astrophysicist.

2020-01-26 18:57:54
Run on (4 X 1600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 256 KiB (x2)
  L3 Unified 3072 KiB (x1)
Load Average: 4.15, 3.13, 3.06
***WARNING*** Library was built as DEBUG. Timings may be affected.
--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
Old_KeplerSTM       9422 ns         8759 ns        74960
New_KeplerSTM       7612 ns         7298 ns        84318

The next step is to see what Cython has to offer in terms of modern C++ features, and then apply it to the current codebase. I also intend to contribute to Cython since they lack many of the powerful algorithms that C++ provides.

dsavransky commented 1 year ago

Fold into #333

dsavransky / EXOSIMS

[Performance] Using std::pow(x, n) versus x * x * ... * x #242