Closed jamiebramwell closed 3 months ago
Pinging @tupek2 on this for project awareness.
I ran both the benchmark_thermal
and the benchmark_functional
on this branch and compared it to today's develop. The benchmark_thermal
showed basically no change.
develop:
Path Min time/rank Max time/rank Avg time/rank Time %
2D Linear Static 0.775459 0.775459 0.775459 0.722444
2D Quadratic Static 0.686454 0.686454 0.686454 0.639524
3D Linear Static 3.003359 3.003359 3.003359 2.798031
3D Quadratic Static 19.934779 19.934779 19.934779 18.571914
2D Linear Dynamic 0.943019 0.943019 0.943019 0.878548
2D Quadratic Dynamic 2.617707 2.617707 2.617707 2.438744
3D Linear Dynamic 11.462568 11.462568 11.462568 10.678916
3D Quadratic Dynamic 67.914439 67.914439 67.914439 63.271389
feature/bowen/structs:
Path Min time/rank Max time/rank Avg time/rank Time %
2D Linear Static 0.563011 0.563011 0.563011 0.527720
2D Quadratic Static 0.679593 0.679593 0.679593 0.636995
3D Linear Static 3.039139 3.039139 3.039139 2.848638
3D Quadratic Static 19.739250 19.739250 19.739250 18.501944
2D Linear Dynamic 0.924319 0.924319 0.924319 0.866381
2D Quadratic Dynamic 2.615020 2.615020 2.615020 2.451103
3D Linear Dynamic 11.437081 11.437081 11.437081 10.720176
3D Quadratic Dynamic 67.689598 67.689598 67.689598 63.446643
Same with the benchmark_functional
:
develop:
Path Min time/rank Max time/rank Avg time/rank Time %
scalar H1 0.000019 0.000019 0.000019 0.000003
dimension 2, order 1 0.225870 0.225870 0.225870 0.039097
residual evaluation 0.011999 0.011999 0.011999 0.002077
compute gradient 0.022880 0.022880 0.022880 0.003960
apply gradient 0.007694 0.007694 0.007694 0.001332
assemble gradient 0.030300 0.030300 0.030300 0.005245
dimension 2, order 2 0.265407 0.265407 0.265407 0.045940
residual evaluation 0.028989 0.028989 0.028989 0.005018
compute gradient 0.053153 0.053153 0.053153 0.009200
apply gradient 0.019146 0.019146 0.019146 0.003314
assemble gradient 0.177074 0.177074 0.177074 0.030650
dimension 3, order 1 1.284311 1.284311 1.284311 0.222306
residual evaluation 0.316009 0.316009 0.316009 0.054699
compute gradient 0.630511 0.630511 0.630511 0.109137
apply gradient 0.155231 0.155231 0.155231 0.026869
assemble gradient 1.127949 1.127949 1.127949 0.195241
dimension 3, order 2 21.559130 21.559130 21.559130 3.731745
residual evaluation 1.095652 1.095652 1.095652 0.189650
compute gradient 2.193447 2.193447 2.193447 0.379671
apply gradient 0.582843 0.582843 0.582843 0.100886
assemble gradient 15.762068 15.762068 15.762068 2.728311
vector H1 0.000021 0.000021 0.000021 0.000004
dimension 2, order 1 0.142469 0.142469 0.142469 0.024661
residual evaluation 0.021928 0.021928 0.021928 0.003796
compute gradient 0.063093 0.063093 0.063093 0.010921
apply gradient 0.018802 0.018802 0.018802 0.003255
assemble gradient 0.115322 0.115322 0.115322 0.019961
dimension 2, order 2 1.265889 1.265889 1.265889 0.219117
residual evaluation 0.052449 0.052449 0.052449 0.009079
compute gradient 0.146337 0.146337 0.146337 0.025330
apply gradient 0.045144 0.045144 0.045144 0.007814
assemble gradient 0.723344 0.723344 0.723344 0.125206
dimension 3, order 1 10.412975 10.412975 10.412975 1.802418
residual evaluation 0.717534 0.717534 0.717534 0.124200
compute gradient 2.892620 2.892620 2.892620 0.500694
apply gradient 0.703525 0.703525 0.703525 0.121776
assemble gradient 10.803800 10.803800 10.803800 1.870067
dimension 3, order 2 288.591138 288.591138 288.591138 49.953246
residual evaluation 2.526766 2.526766 2.526766 0.437367
compute gradient 10.805137 10.805137 10.805137 1.870299
apply gradient 2.470287 2.470287 2.470287 0.427591
assemble gradient 199.653976 199.653976 199.653976 34.558803
feature/bowen/structs:
Path Min time/rank Max time/rank Avg time/rank Time %
scalar H1 0.000019 0.000019 0.000019 0.000003
dimension 2, order 1 0.185780 0.185780 0.185780 0.031969
residual evaluation 0.012261 0.012261 0.012261 0.002110
compute gradient 0.022789 0.022789 0.022789 0.003922
apply gradient 0.007958 0.007958 0.007958 0.001369
assemble gradient 0.030593 0.030593 0.030593 0.005264
dimension 2, order 2 0.264404 0.264404 0.264404 0.045498
residual evaluation 0.028888 0.028888 0.028888 0.004971
compute gradient 0.052706 0.052706 0.052706 0.009070
apply gradient 0.019262 0.019262 0.019262 0.003315
assemble gradient 0.175531 0.175531 0.175531 0.030205
dimension 3, order 1 1.296115 1.296115 1.296115 0.223034
residual evaluation 0.317541 0.317541 0.317541 0.054642
compute gradient 0.642507 0.642507 0.642507 0.110562
apply gradient 0.158886 0.158886 0.158886 0.027341
assemble gradient 1.140091 1.140091 1.140091 0.196186
dimension 3, order 2 21.562900 21.562900 21.562900 3.710520
residual evaluation 1.107753 1.107753 1.107753 0.190621
compute gradient 2.238823 2.238823 2.238823 0.385254
apply gradient 0.592176 0.592176 0.592176 0.101901
assemble gradient 16.256250 16.256250 16.256250 2.797358
vector H1 0.000019 0.000019 0.000019 0.000003
dimension 2, order 1 0.140954 0.140954 0.140954 0.024255
residual evaluation 0.021928 0.021928 0.021928 0.003773
compute gradient 0.063095 0.063095 0.063095 0.010857
apply gradient 0.018991 0.018991 0.018991 0.003268
assemble gradient 0.117446 0.117446 0.117446 0.020210
dimension 2, order 2 1.232524 1.232524 1.232524 0.212091
residual evaluation 0.051013 0.051013 0.051013 0.008778
compute gradient 0.145052 0.145052 0.145052 0.024960
apply gradient 0.045237 0.045237 0.045237 0.007784
assemble gradient 0.721032 0.721032 0.721032 0.124074
dimension 3, order 1 10.397752 10.397752 10.397752 1.789234
residual evaluation 0.726283 0.726283 0.726283 0.124978
compute gradient 2.995355 2.995355 2.995355 0.515437
apply gradient 0.684573 0.684573 0.684573 0.117801
assemble gradient 10.759378 10.759378 10.759378 1.851462
dimension 3, order 2 288.829889 288.829889 288.829889 49.701534
residual evaluation 2.595823 2.595823 2.595823 0.446686
compute gradient 11.061210 11.061210 11.061210 1.903401
apply gradient 2.455172 2.455172 2.455172 0.422483
assemble gradient 201.952514 201.952514 201.952514 34.751770
But surprisingly the benchmark_functional showed a large speed-up compared to develop, specifically in the assemble gradient section.
It appears that this PR reduced the level of refinement in the benchmark_functional
performance test, so develop
takes longer because it's just running a bigger problem. We need to be diligent about not changing benchmark problem parameters if we want to be able to track performance metrics over time.
But surprisingly the benchmark_functional showed a large speed-up compared to develop, specifically in the assemble gradient section.
It appears that this PR reduced the level of refinement in the
benchmark_functional
performance test, sodevelop
takes longer because it's just running a bigger problem. We need to be diligent about not changing benchmark problem parameters if we want to be able to track performance metrics over time.
Ah yes, I remember seeing that before and also makes way more sense why this showed such a large change.
I'll put that back locally and run the benchmark again. Would you prefer that change be undone?
Timing mystery solved! It is back to not having any difference as expected. Thanks @samuelpmishLLNL !
Updated comment above to be the new timing.
This swaps generic lambdas to templated functors for tests of
Functional
-derived capabilities. This is due to the fact that nvcc does not allow generic extended lambdas and extended lambdas are required by RAJA.This branch is derived from feature/bowen/raja-for-all in an attempt to make the changes easier for reviewers.