JamesYang007 / ADBenchmark

A small repo dedicated to benchmarking various AD libraries
5 stars 2 forks source link

Convert sum benchmark to use `var<mat>` #2

Open bbbales2 opened 4 years ago

bbbales2 commented 4 years ago

Converted one! As we get the rest of the var<mat> stuff in place we can convert the rest.

FastAD sum:

-----------------------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------
BM_fastad<SumFunc>/1           6.81 ns         6.81 ns    102241091 N=1
BM_fastad<SumFunc>/2           7.57 ns         7.57 ns     92824386 N=2
BM_fastad<SumFunc>/4           6.88 ns         6.88 ns    101760162 N=4
BM_fastad<SumFunc>/8           7.59 ns         7.59 ns     92351222 N=8
BM_fastad<SumFunc>/16          9.59 ns         9.59 ns     73186002 N=16
BM_fastad<SumFunc>/32          14.0 ns         14.0 ns     49094946 N=32
BM_fastad<SumFunc>/64          27.4 ns         27.3 ns     26011579 N=64
BM_fastad<SumFunc>/128         53.2 ns         53.2 ns     13048065 N=128
BM_fastad<SumFunc>/256         95.4 ns         95.4 ns      7325127 N=256
BM_fastad<SumFunc>/512          179 ns          179 ns      3912272 N=512
BM_fastad<SumFunc>/1024         348 ns          348 ns      2019448 N=1024
BM_fastad<SumFunc>/2048         687 ns          687 ns      1023662 N=2.048k
BM_fastad<SumFunc>/4096        1427 ns         1426 ns       490766 N=4.096k
BM_fastad<SumFunc>/8192        2808 ns         2807 ns       245426 N=8.192k
BM_fastad<SumFunc>/16384       5619 ns         5619 ns       124105 N=16.384k

Stan sum:

-----------------------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------
BM_stan<SumFunc, varmat>/1           26.8 ns         26.8 ns     26350903 N=1
BM_stan<SumFunc, varmat>/2           31.8 ns         31.8 ns     22029484 N=2
BM_stan<SumFunc, varmat>/4           35.8 ns         35.8 ns     19562051 N=4
BM_stan<SumFunc, varmat>/8           41.0 ns         41.0 ns     16943482 N=8
BM_stan<SumFunc, varmat>/16          45.6 ns         45.6 ns     15352545 N=16
BM_stan<SumFunc, varmat>/32          54.2 ns         54.2 ns     12865175 N=32
BM_stan<SumFunc, varmat>/64          66.4 ns         66.4 ns     10597147 N=64
BM_stan<SumFunc, varmat>/128         95.3 ns         95.3 ns      7351159 N=128
BM_stan<SumFunc, varmat>/256          168 ns          168 ns      4160385 N=256
BM_stan<SumFunc, varmat>/512          281 ns          281 ns      2478569 N=512
BM_stan<SumFunc, varmat>/1024         514 ns          514 ns      1358337 N=1024
BM_stan<SumFunc, varmat>/2048        1127 ns         1127 ns       616329 N=2.048k
BM_stan<SumFunc, varmat>/4096        2041 ns         2040 ns       342880 N=4.096k
BM_stan<SumFunc, varmat>/8192        3947 ns         3946 ns       178015 N=8.192k
BM_stan<SumFunc, varmat>/16384       8133 ns         8132 ns        86724 N=16.384k
BM_stan<SumFunc, matvar>/1           41.8 ns         41.8 ns     16774336 N=1
BM_stan<SumFunc, matvar>/2           43.9 ns         43.9 ns     15897233 N=2
BM_stan<SumFunc, matvar>/4           50.3 ns         50.3 ns     14074020 N=4
BM_stan<SumFunc, matvar>/8           63.8 ns         63.8 ns     10681440 N=8
BM_stan<SumFunc, matvar>/16          90.4 ns         90.4 ns      7792413 N=16
BM_stan<SumFunc, matvar>/32           144 ns          144 ns      5011206 N=32
BM_stan<SumFunc, matvar>/64           341 ns          341 ns      2106592 N=64
BM_stan<SumFunc, matvar>/128          627 ns          627 ns      1089710 N=128
BM_stan<SumFunc, matvar>/256         1252 ns         1252 ns       559918 N=256
BM_stan<SumFunc, matvar>/512         2466 ns         2466 ns       284112 N=512
BM_stan<SumFunc, matvar>/1024        4959 ns         4959 ns       140986 N=1024
BM_stan<SumFunc, matvar>/2048        9966 ns         9965 ns        70066 N=2.048k
BM_stan<SumFunc, matvar>/4096       19909 ns        19907 ns        35190 N=4.096k
BM_stan<SumFunc, matvar>/8192       52597 ns        52590 ns        13336 N=8.192k
BM_stan<SumFunc, matvar>/16384     129659 ns       129640 ns         5416 N=16.384k
bbbales2 commented 4 years ago

Actually maybe I made mat<var> slower with this lol. I think it was only like 80-90us before (Edit: at N = 16384).

bbbales2 commented 3 years ago

@JamesYang007 I was converting more of these, in the StochasticVolatility example as-is I'm getting outputs like:

BM_stan<StochasticVolatilityFunc, matvar>/32          2013 ns         2013 ns       352799 N=35
WARNING (stan-stochastic_volatility) MAX ABS ERROR PROP: 2.50972e-15
WARNING (stan-stochastic_volatility) MAX ABS ERROR PROP: 2.69971e-15
WARNING (stan-stochastic_volatility) MAX ABS ERROR PROP: 0.815661
WARNING (stan-stochastic_volatility) MAX ABS ERROR PROP: 3.19019e-15
WARNING (stan-stochastic_volatility) MAX ABS ERROR PROP: 2.10088e-15
WARNING (stan-stochastic_volatility) MAX ABS ERROR PROP: 1.87386e-15
WARNING (stan-stochastic_volatility) MAX ABS ERROR PROP: 2.452e-15

The 0.815 makes me think something is broken, and so I'll look into that, but the way this is written the h variable is kinda part of the input. Like why is this:

auto operator()(Eigen::Matrix<stan::math::var, Eigen::Dynamic, 1>& x) const
    {
        using namespace stan::math;
        using vec_t = Eigen::Matrix<var, Eigen::Dynamic, 1>;
        size_t N = (x.size() - 3) / 2;
        Eigen::Map<vec_t> h_std(x.data(), N);
        Eigen::Map<vec_t> h(x.data() + N, N);
        auto& phi = x(2*N);
        auto& sigma = x(2*N + 1);
        auto& mu = x(2*N + 2);
        h = h_std * sigma;
        ...;
     }

Not something like:

auto operator()(Eigen::Matrix<stan::math::var, Eigen::Dynamic, 1>& x) const
    {
        using namespace stan::math;
        using vec_t = Eigen::Matrix<var, Eigen::Dynamic, 1>;
        size_t N = (x.size() - 3) / 2;
        Eigen::Map<vec_t> h_std(x.data(), N);
        auto& phi = x(N);
        auto& sigma = x(N + 1);
        auto& mu = x(N + 2);
        vec_t h = h_std * sigma;
        ...;
     }

I see the default implementation is like this too. I wanna change it :D.

JamesYang007 commented 3 years ago

Ah I didn't want to allocate more than I needed to. The parameter x for operator() is supposed to represent the entire parameter vector and h is a (transformed) parameter. Some libraries (like Stan) allow for this kind of "viewer" logic which generally saves time so I wanted to give them the advantage if they supported it.