Open spinkney opened 3 years ago
Thanks @spinkney . I'm hitting the error below, but from the repo I did get
make -j4 -f ~/stan-dev/math/make/standalone math-libs
to work with no problem. I'm not sure if that might get us part of the way there.
FWIW I'm on Manjaro Linux, and my cmake is version 3.21
chad@boondoggle:~/g/s/m/l/b/build|develop✓
➤ cmake .. -DCMAKE_BUILD_TYPE=RELEASE
-- Failed to find LLVM FileCheck
-- git Version: v4.1.0-95321916
-- Version: 4.1.0
-- Performing Test HAVE_STD_REGEX -- success
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Performing Test HAVE_POSIX_REGEX -- success
-- Performing Test HAVE_STEADY_CLOCK -- success
-- Looking for Google Test sources
-- Looking for Google Test sources in /home/chad/git/stan/math/lib/benchmark_1.5.1/googletest
-- Found Google Test in /home/chad/git/stan/math/lib/benchmark_1.5.1/googletest
-- Configuring done
-- Generating done
-- Build files have been written to: /home/chad/git/stan/math/lib/benchmark_1.5.1/build/third_party/googletest
[100%] Built target googletest
CMake Deprecation Warning at googletest/CMakeLists.txt:4 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
CMake Deprecation Warning at googletest/googlemock/CMakeLists.txt:45 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
CMake Deprecation Warning at googletest/googletest/CMakeLists.txt:56 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/chad/git/stan/math/lib/benchmark_1.5.1/build
chad@boondoggle:~/g/s/m/l/b/build|develop✓
➤ make
Consolidate compiler generated dependencies of target benchmark
[ 1%] Building CXX object src/CMakeFiles/benchmark.dir/benchmark_register.cc.o
In file included from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.cc:15:
/home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h: In function ‘typename std::vector<T>::iterator benchmark::internal::AddPowers(std::vector<T>*, T, T, int)’:
/home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h:22:30: error: ‘numeric_limits’ is not a member of ‘std’
22 | static const T kmax = std::numeric_limits<T>::max();
| ^~~~~~~~~~~~~~
/home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h:22:46: error: expected primary-expression before ‘>’ token
22 | static const T kmax = std::numeric_limits<T>::max();
| ^
/home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h:22:49: error: ‘::max’ has not been declared; did you mean ‘std::max’?
22 | static const T kmax = std::numeric_limits<T>::max();
| ^~~
| std::max
In file included from /usr/include/c++/11.1.0/algorithm:62,
from /home/chad/git/stan/math/lib/benchmark_1.5.1/include/benchmark/benchmark.h:172,
from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/internal_macros.h:4,
from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/check.h:8,
from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.h:6,
from /home/chad/git/stan/math/lib/benchmark_1.5.1/src/benchmark_register.cc:15:
/usr/include/c++/11.1.0/bits/stl_algo.h:3467:5: note: ‘std::max’ declared here
3467 | max(initializer_list<_Tp> __l, _Compare __comp)
| ^~~
make[2]: *** [src/CMakeFiles/benchmark.dir/build.make:118: src/CMakeFiles/benchmark.dir/benchmark_register.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:252: src/CMakeFiles/benchmark.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
ok, I'm not sure but this seems like a google benchmarks error. Can you try installing following their instructions? I had a ton of problems getting it setup. If that still fails, can you put an issue in the stan math repo and tag Rok and Tadej?
For my own notes, my local path is /home/chad/git/stan/math
.
From here, I do
➤ ./benchmarks/benchmark.py "multi_normal_cholesky_lpdf(vector, vector, matrix) => real" --skip_similar_signatures --csv benchmark_results.csv
Result:
Run on (32 X 3500 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 64 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 8192 KiB (x4)
Load Average: 0.70, 0.54, 0.42
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
--------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------------------------------------
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/1/manual_time 213 ns 245 ns 3292713
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/2/manual_time 215 ns 247 ns 3258665
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/4/manual_time 256 ns 288 ns 2756494
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/8/manual_time 343 ns 375 ns 2045407
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/16/manual_time 645 ns 676 ns 1085494
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/32/manual_time 1292 ns 1324 ns 547179
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/64/manual_time 3278 ns 3308 ns 213126
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/128/manual_time 38244 ns 38273 ns 18486
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/256/manual_time 202443 ns 202358 ns 3451
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/512/manual_time 1160055 ns 1158346 ns 609
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/1024/manual_time 8494975 ns 8475861 ns 82
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/2048/manual_time 33174578 ns 33115064 ns 21
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Prim_matrix/4096/manual_time 108087240 ns 107880830 ns 6
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/1/manual_time 290 ns 332 ns 2418198
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/2/manual_time 345 ns 387 ns 2037665
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/4/manual_time 478 ns 521 ns 1466586
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/8/manual_time 1008 ns 1059 ns 694922
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/16/manual_time 2938 ns 3035 ns 237879
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/32/manual_time 11058 ns 11235 ns 63401
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/64/manual_time 50669 ns 51198 ns 13795
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/128/manual_time 261976 ns 263968 ns 2661
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/256/manual_time 1723172 ns 1730515 ns 405
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/512/manual_time 14531122 ns 14680486 ns 48
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/1024/manual_time 97342681 ns 97874134 ns 7
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/2048/manual_time 676527384 ns 687136864 ns 1
multi_normal_cholesky_lpdf_Prim_vector_Prim_vector_Rev_matrix/4096/manual_time 4738165466 ns 4770911148 ns 1
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/1/manual_time 240 ns 281 ns 2919951
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/2/manual_time 263 ns 305 ns 2663178
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/4/manual_time 315 ns 357 ns 2230725
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/8/manual_time 445 ns 486 ns 1570989
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/16/manual_time 790 ns 832 ns 885042
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/32/manual_time 1562 ns 1608 ns 446372
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/64/manual_time 3765 ns 3815 ns 185350
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/128/manual_time 11126 ns 11192 ns 63481
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/256/manual_time 39261 ns 39367 ns 17834
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/512/manual_time 206476 ns 206371 ns 3262
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/1024/manual_time 8515388 ns 8498800 ns 82
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/2048/manual_time 29953101 ns 29884394 ns 22
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Prim_matrix/4096/manual_time 121301273 ns 121073368 ns 6
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/1/manual_time 295 ns 350 ns 2361972
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/2/manual_time 344 ns 402 ns 2033708
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/4/manual_time 481 ns 540 ns 1455069
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/8/manual_time 1024 ns 1093 ns 682917
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/16/manual_time 2984 ns 3101 ns 234431
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/32/manual_time 10988 ns 11196 ns 63912
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/64/manual_time 50898 ns 51448 ns 13741
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/128/manual_time 262833 ns 264798 ns 2670
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/256/manual_time 1698088 ns 1705422 ns 412
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/512/manual_time 14432855 ns 14587499 ns 48
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/1024/manual_time 96299953 ns 96781895 ns 7
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/2048/manual_time 665848206 ns 676422135 ns 1
multi_normal_cholesky_lpdf_Prim_vector_Rev_vector_Rev_matrix/4096/manual_time 4680491581 ns 4713320375 ns 1
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/1/manual_time 243 ns 283 ns 2885907
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/2/manual_time 270 ns 312 ns 2628065
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/4/manual_time 319 ns 361 ns 2170670
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/8/manual_time 451 ns 492 ns 1555933
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/16/manual_time 799 ns 842 ns 880038
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/32/manual_time 1580 ns 1625 ns 442967
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/64/manual_time 3791 ns 3841 ns 184843
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/128/manual_time 11003 ns 11066 ns 61635
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/256/manual_time 42178 ns 42278 ns 18001
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/512/manual_time 205619 ns 205538 ns 3419
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/1024/manual_time 8496523 ns 8479111 ns 82
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/2048/manual_time 29454024 ns 29381839 ns 24
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Prim_matrix/4096/manual_time 120069235 ns 119819488 ns 6
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/1/manual_time 296 ns 351 ns 2363426
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/2/manual_time 348 ns 406 ns 2009612
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/4/manual_time 480 ns 539 ns 1435267
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/8/manual_time 1034 ns 1103 ns 685005
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/16/manual_time 2992 ns 3111 ns 234824
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/32/manual_time 11092 ns 11292 ns 63952
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/64/manual_time 50996 ns 51535 ns 13596
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/128/manual_time 263428 ns 265377 ns 2656
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/256/manual_time 1698164 ns 1705240 ns 412
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/512/manual_time 14436851 ns 14589265 ns 48
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/1024/manual_time 101460165 ns 101888032 ns 7
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/2048/manual_time 699223812 ns 707639401 ns 1
multi_normal_cholesky_lpdf_Rev_vector_Prim_vector_Rev_matrix/4096/manual_time 4700218268 ns 4733751090 ns 1
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/1/manual_time 252 ns 310 ns 2770832
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/2/manual_time 265 ns 325 ns 2621660
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/4/manual_time 323 ns 382 ns 2167699
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/8/manual_time 461 ns 521 ns 1513690
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/16/manual_time 825 ns 889 ns 849466
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/32/manual_time 1676 ns 1742 ns 418616
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/64/manual_time 4011 ns 4086 ns 173981
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/128/manual_time 11615 ns 11721 ns 60323
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/256/manual_time 40184 ns 40357 ns 17456
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/512/manual_time 213260 ns 213335 ns 3290
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/1024/manual_time 7765554 ns 7749288 ns 90
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/2048/manual_time 28426950 ns 28365665 ns 25
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Prim_matrix/4096/manual_time 116560082 ns 116349331 ns 6
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/1/manual_time 308 ns 381 ns 2275819
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/2/manual_time 367 ns 443 ns 1908799
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/4/manual_time 515 ns 594 ns 1353336
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/8/manual_time 1063 ns 1153 ns 655936
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/16/manual_time 3016 ns 3154 ns 231751
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/32/manual_time 10965 ns 11182 ns 60987
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/64/manual_time 51039 ns 51606 ns 13750
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/128/manual_time 261396 ns 263380 ns 2679
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/256/manual_time 1718811 ns 1725630 ns 408
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/512/manual_time 14530940 ns 14695641 ns 48
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/1024/manual_time 97229225 ns 97725052 ns 7
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/2048/manual_time 736861915 ns 746495961 ns 1
multi_normal_cholesky_lpdf_Rev_vector_Rev_vector_Rev_matrix/4096/manual_time 4683455091 ns 4718489565 ns 1
Initial thoughts:
DEBUG
Prim
and Rev
, but I don't know what that means. Is this documented somewhere?@chriselrod I guess you've compared with Stan before, any suggestions?
* I've heard of CPU scaling, but don't know any details. I guess I should turn this off?
I assume this means boost. I made a few comments here about it, with a tested example of turning it off on a laptop using the Intel pstate driver, and untested code that should work with the cpufreq driver. I also linked to the Linux documentation for more details / instructions.
Note that you'll probably want to turn it back on after benchmarking. Particularly on laptops, the boost speed is substantially higher than the base clock. E.g., my i7 1165G7 tends to run at well over 4 GHz (up to 4.7). Disabling scaling makes it run at the base speed of 2.8 GHz instead. This makes benchmarks more reliable, but also obviously much slower.
Desktops tend to have smaller differences.
If you have a desktop with good cooling, you may also want to just "overclock" in the bios. You can set some speed at the high end of boost as a constant speed.
@chriselrod I guess you've compared with Stan before, any suggestions?
I think I just compiled the math library and ccall
ed functions of interest from Julia.
I'm not sure if this will give exactly what you want as it's more meant for benchmarking speed of forward/rev model vs size.
For the mvn benchmarks, it's known that
multivariate_normal
andmultivariate_normal_prec
are not fully optimized (see https://github.com/stan-dev/math/issues/2544). The benchmark to compare against at this time ismultivariate_normal_cholesky
.Steps to bench:
The
benchmark_results.csv
contains the benchmarks.