cscherrer / MultivariateMeasures.jl

Optimized implementations for higher-dimensional measures
MIT License
7 stars 0 forks source link

Benchmarking Stan #20

Open spinkney opened 3 years ago

spinkney commented 3 years ago

I'm not sure if this will give exactly what you want as it's more meant for benchmarking speed of forward/rev model vs size.

For the mvn benchmarks, it's known that multivariate_normal and multivariate_normal_prec are not fully optimized (see https://github.com/stan-dev/math/issues/2544). The benchmark to compare against at this time is multivariate_normal_cholesky.

Steps to bench:

  1. Clone stan-math.
  2. Setup Google Benchmark
    cd lib/benchmark_1.5.1
    mkdir build && cd build
    cmake .. -DCMAKE_BUILD_TYPE=RELEASE
    make
    cd ../../..
  3. Run the develop branch benchmark
    git checkout develop
    ./benchmarks/benchmark.py "multi_normal_cholesky_lpdf(vector, vector, matrix) => real" --skip_similar_signatures --csv benchmark_results.csv

The benchmark_results.csv contains the benchmarks.

cscherrer commented 3 years ago

Thanks @spinkney . I'm hitting the error below, but from the repo I did get

make -j4 -f ~/stan-dev/math/make/standalone math-libs

to work with no problem. I'm not sure if that might get us part of the way there.

FWIW I'm on Manjaro Linux, and my cmake is version 3.21

chad@boondoggle:~/g/s/m/l/b/build|develop✓
➤ cmake .. -DCMAKE_BUILD_TYPE=RELEASE
-- Failed to find LLVM FileCheck
-- git Version: v4.1.0-95321916
-- Version: 4.1.0
-- Performing Test HAVE_STD_REGEX -- success
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Performing Test HAVE_POSIX_REGEX -- success
-- Performing Test HAVE_STEADY_CLOCK -- success
-- Looking for Google Test sources
-- Looking for Google Test sources in /home/chad/git/stan/math/lib/benchmark_1.5.1/googletest
-- Found Google Test in /home/chad/git/stan/math/lib/benchmark_1.5.1/googletest
-- Configuring done
-- Generating done
-- Build files have been written to: /home/chad/git/stan/math/lib/benchmark_1.5.1/build/third_party/googletest
[100%] Built target googletest
CMake Deprecation Warning at googletest/CMakeLists.txt:4 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

CMake Deprecation Warning at googletest/googlemock/CMakeLists.txt:45 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

CMake Deprecation Warning at googletest/googletest/CMakeLists.txt:56 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

-- Configuring done
-- Generating done
-- Build files have been written to: /home/chad/git/stan/math/lib/benchmark_1.5.1/build
chad@boondoggle:~/g/s/m/l/b/build|develop✓
➤ make
Consolidate compiler generated dependencies of target benchmark
[  1%] Building CXX object src/CMakeFiles/benchmark.dir/benchmark_register.cc.o
In file included from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.cc:15:
/home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h: In function ‘typename std::vector<T>::iterator benchmark::internal::AddPowers(std::vector<T>*, T, T, int)’:
/home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h:22:30: error: ‘numeric_limits’ is not a member of ‘std’
   22 |   static const T kmax = std::numeric_limits<T>::max();
      |                              ^~~~~~~~~~~~~~
/home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h:22:46: error: expected primary-expression before ‘>’ token
   22 |   static const T kmax = std::numeric_limits<T>::max();
      |                                              ^
/home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h:22:49: error: ‘::max’ has not been declared; did you mean ‘std::max’?
   22 |   static const T kmax = std::numeric_limits<T>::max();
      |                                                 ^~~
      |                                                 std::max
In file included from /usr/include/c++/11.1.0/algorithm:62,
                 from /home/chad/git/stan/math/lib/benchmark_1.5.1/include/benchmark/benchmark.h:172,
                 from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/internal_macros.h:4,
                 from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/check.h:8,
                 from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h:6,
                 from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.cc:15:
/usr/include/c++/11.1.0/bits/stl_algo.h:3467:5: note: ‘std::max’ declared here
 3467 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
make[2]: *** [src/CMakeFiles/benchmark.dir/build.make:118: src/CMakeFiles/benchmark.dir/benchmark_register.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:252: src/CMakeFiles/benchmark.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
spinkney commented 3 years ago

ok, I'm not sure but this seems like a google benchmarks error. Can you try installing following their instructions? I had a ton of problems getting it setup. If that still fails, can you put an issue in the stan math repo and tag Rok and Tadej?

cscherrer commented 3 years ago

For my own notes, my local path is /home/chad/git/stan/math.

From here, I do

➤ ./benchmarks/benchmark.py "multi_normal_cholesky_lpdf(vector, vector, matrix) => real" --skip_similar_signatures --csv benchmark_results.csv

Result:


Run on (32 X 3500 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 64 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 8192 KiB (x4)
Load Average: 0.70, 0.54, 0.42
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
--------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                Time             CPU   Iterations
--------------------------------------------------------------------------------------------------------------------------
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/1/manual_time           213 ns          245 ns      3292713
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/2/manual_time           215 ns          247 ns      3258665
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/4/manual_time           256 ns          288 ns      2756494
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/8/manual_time           343 ns          375 ns      2045407
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/16/manual_time          645 ns          676 ns      1085494
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/32/manual_time         1292 ns         1324 ns       547179
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/64/manual_time         3278 ns         3308 ns       213126
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/128/manual_time       38244 ns        38273 ns        18486
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/256/manual_time      202443 ns       202358 ns         3451
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/512/manual_time     1160055 ns      1158346 ns          609
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/1024/manual_time    8494975 ns      8475861 ns           82
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/2048/manual_time   33174578 ns     33115064 ns           21
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/4096/manual_time  108087240 ns    107880830 ns            6
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/1/manual_time            290 ns          332 ns      2418198
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/2/manual_time            345 ns          387 ns      2037665
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/4/manual_time            478 ns          521 ns      1466586
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/8/manual_time           1008 ns         1059 ns       694922
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/16/manual_time          2938 ns         3035 ns       237879
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/32/manual_time         11058 ns        11235 ns        63401
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/64/manual_time         50669 ns        51198 ns        13795
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/128/manual_time       261976 ns       263968 ns         2661
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/256/manual_time      1723172 ns      1730515 ns          405
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/512/manual_time     14531122 ns     14680486 ns           48
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/1024/manual_time    97342681 ns     97874134 ns            7
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/2048/manual_time   676527384 ns    687136864 ns            1
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/4096/manual_time  4738165466 ns   4770911148 ns            1
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/1/manual_time            240 ns          281 ns      2919951
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/2/manual_time            263 ns          305 ns      2663178
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/4/manual_time            315 ns          357 ns      2230725
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/8/manual_time            445 ns          486 ns      1570989
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/16/manual_time           790 ns          832 ns       885042
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/32/manual_time          1562 ns         1608 ns       446372
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/64/manual_time          3765 ns         3815 ns       185350
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/128/manual_time        11126 ns        11192 ns        63481
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/256/manual_time        39261 ns        39367 ns        17834
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/512/manual_time       206476 ns       206371 ns         3262
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/1024/manual_time     8515388 ns      8498800 ns           82
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/2048/manual_time    29953101 ns     29884394 ns           22
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/4096/manual_time   121301273 ns    121073368 ns            6
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/1/manual_time             295 ns          350 ns      2361972
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/2/manual_time             344 ns          402 ns      2033708
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/4/manual_time             481 ns          540 ns      1455069
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/8/manual_time            1024 ns         1093 ns       682917
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/16/manual_time           2984 ns         3101 ns       234431
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/32/manual_time          10988 ns        11196 ns        63912
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/64/manual_time          50898 ns        51448 ns        13741
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/128/manual_time        262833 ns       264798 ns         2670
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/256/manual_time       1698088 ns      1705422 ns          412
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/512/manual_time      14432855 ns     14587499 ns           48
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/1024/manual_time     96299953 ns     96781895 ns            7
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/2048/manual_time    665848206 ns    676422135 ns            1
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/4096/manual_time   4680491581 ns   4713320375 ns            1
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/1/manual_time            243 ns          283 ns      2885907
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/2/manual_time            270 ns          312 ns      2628065
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/4/manual_time            319 ns          361 ns      2170670
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/8/manual_time            451 ns          492 ns      1555933
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/16/manual_time           799 ns          842 ns       880038
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/32/manual_time          1580 ns         1625 ns       442967
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/64/manual_time          3791 ns         3841 ns       184843
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/128/manual_time        11003 ns        11066 ns        61635
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/256/manual_time        42178 ns        42278 ns        18001
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/512/manual_time       205619 ns       205538 ns         3419
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/1024/manual_time     8496523 ns      8479111 ns           82
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/2048/manual_time    29454024 ns     29381839 ns           24
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/4096/manual_time   120069235 ns    119819488 ns            6
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/1/manual_time             296 ns          351 ns      2363426
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/2/manual_time             348 ns          406 ns      2009612
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/4/manual_time             480 ns          539 ns      1435267
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/8/manual_time            1034 ns         1103 ns       685005
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/16/manual_time           2992 ns         3111 ns       234824
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/32/manual_time          11092 ns        11292 ns        63952
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/64/manual_time          50996 ns        51535 ns        13596
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/128/manual_time        263428 ns       265377 ns         2656
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/256/manual_time       1698164 ns      1705240 ns          412
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/512/manual_time      14436851 ns     14589265 ns           48
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/1024/manual_time    101460165 ns    101888032 ns            7
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/2048/manual_time    699223812 ns    707639401 ns            1
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/4096/manual_time   4700218268 ns   4733751090 ns            1
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/1/manual_time             252 ns          310 ns      2770832
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/2/manual_time             265 ns          325 ns      2621660
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/4/manual_time             323 ns          382 ns      2167699
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/8/manual_time             461 ns          521 ns      1513690
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/16/manual_time            825 ns          889 ns       849466
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/32/manual_time           1676 ns         1742 ns       418616
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/64/manual_time           4011 ns         4086 ns       173981
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/128/manual_time         11615 ns        11721 ns        60323
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/256/manual_time         40184 ns        40357 ns        17456
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/512/manual_time        213260 ns       213335 ns         3290
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/1024/manual_time      7765554 ns      7749288 ns           90
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/2048/manual_time     28426950 ns     28365665 ns           25
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/4096/manual_time    116560082 ns    116349331 ns            6
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/1/manual_time              308 ns          381 ns      2275819
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/2/manual_time              367 ns          443 ns      1908799
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/4/manual_time              515 ns          594 ns      1353336
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/8/manual_time             1063 ns         1153 ns       655936
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/16/manual_time            3016 ns         3154 ns       231751
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/32/manual_time           10965 ns        11182 ns        60987
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/64/manual_time           51039 ns        51606 ns        13750
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/128/manual_time         261396 ns       263380 ns         2679
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/256/manual_time        1718811 ns      1725630 ns          408
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/512/manual_time       14530940 ns     14695641 ns           48
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/1024/manual_time      97229225 ns     97725052 ns            7
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/2048/manual_time     736861915 ns    746495961 ns            1
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/4096/manual_time    4683455091 ns   4718489565 ns            1

Initial thoughts:

@chriselrod I guess you've compared with Stan before, any suggestions?

chriselrod commented 3 years ago
* I've heard of CPU scaling, but don't know any details. I guess I should turn this off?

I assume this means boost. I made a few comments here about it, with a tested example of turning it off on a laptop using the Intel pstate driver, and untested code that should work with the cpufreq driver. I also linked to the Linux documentation for more details / instructions.

Note that you'll probably want to turn it back on after benchmarking. Particularly on laptops, the boost speed is substantially higher than the base clock. E.g., my i7 1165G7 tends to run at well over 4 GHz (up to 4.7). Disabling scaling makes it run at the base speed of 2.8 GHz instead. This makes benchmarks more reliable, but also obviously much slower.

Desktops tend to have smaller differences.

If you have a desktop with good cooling, you may also want to just "overclock" in the bios. You can set some speed at the high end of boost as a constant speed.

@chriselrod I guess you've compared with Stan before, any suggestions?

I think I just compiled the math library and ccalled functions of interest from Julia.