LLNL / serac

Serac is a high order nonlinear thermomechanical simulation code
BSD 3-Clause "New" or "Revised" License
182 stars 33 forks source link

Change from lambdas to structs for `Functional` #1129

Closed jamiebramwell closed 3 months ago

jamiebramwell commented 4 months ago

This swaps generic lambdas to templated functors for tests of Functional-derived capabilities. This is due to the fact that nvcc does not allow generic extended lambdas and extended lambdas are required by RAJA.

This branch is derived from feature/bowen/raja-for-all in an attempt to make the changes easier for reviewers.

btalamini commented 4 months ago

Pinging @tupek2 on this for project awareness.

white238 commented 3 months ago

I ran both the benchmark_thermal and the benchmark_functional on this branch and compared it to today's develop. The benchmark_thermal showed basically no change.

develop:

Path                 Min time/rank Max time/rank Avg time/rank Time %    
2D Linear Static          0.775459      0.775459      0.775459  0.722444 
2D Quadratic Static       0.686454      0.686454      0.686454  0.639524 
3D Linear Static          3.003359      3.003359      3.003359  2.798031 
3D Quadratic Static      19.934779     19.934779     19.934779 18.571914 
2D Linear Dynamic         0.943019      0.943019      0.943019  0.878548 
2D Quadratic Dynamic      2.617707      2.617707      2.617707  2.438744 
3D Linear Dynamic        11.462568     11.462568     11.462568 10.678916 
3D Quadratic Dynamic     67.914439     67.914439     67.914439 63.271389 

feature/bowen/structs:

Path                 Min time/rank Max time/rank Avg time/rank Time %    
2D Linear Static          0.563011      0.563011      0.563011  0.527720 
2D Quadratic Static       0.679593      0.679593      0.679593  0.636995 
3D Linear Static          3.039139      3.039139      3.039139  2.848638 
3D Quadratic Static      19.739250     19.739250     19.739250 18.501944 
2D Linear Dynamic         0.924319      0.924319      0.924319  0.866381 
2D Quadratic Dynamic      2.615020      2.615020      2.615020  2.451103 
3D Linear Dynamic        11.437081     11.437081     11.437081 10.720176 
3D Quadratic Dynamic     67.689598     67.689598     67.689598 63.446643 

Same with the benchmark_functional:

develop:

Path                    Min time/rank Max time/rank Avg time/rank Time %    
scalar H1                    0.000019      0.000019      0.000019  0.000003 
  dimension 2, order 1       0.225870      0.225870      0.225870  0.039097 
    residual evaluation      0.011999      0.011999      0.011999  0.002077 
    compute gradient         0.022880      0.022880      0.022880  0.003960 
    apply gradient           0.007694      0.007694      0.007694  0.001332 
    assemble gradient        0.030300      0.030300      0.030300  0.005245 
  dimension 2, order 2       0.265407      0.265407      0.265407  0.045940 
    residual evaluation      0.028989      0.028989      0.028989  0.005018 
    compute gradient         0.053153      0.053153      0.053153  0.009200 
    apply gradient           0.019146      0.019146      0.019146  0.003314 
    assemble gradient        0.177074      0.177074      0.177074  0.030650 
  dimension 3, order 1       1.284311      1.284311      1.284311  0.222306 
    residual evaluation      0.316009      0.316009      0.316009  0.054699 
    compute gradient         0.630511      0.630511      0.630511  0.109137 
    apply gradient           0.155231      0.155231      0.155231  0.026869 
    assemble gradient        1.127949      1.127949      1.127949  0.195241 
  dimension 3, order 2      21.559130     21.559130     21.559130  3.731745 
    residual evaluation      1.095652      1.095652      1.095652  0.189650 
    compute gradient         2.193447      2.193447      2.193447  0.379671 
    apply gradient           0.582843      0.582843      0.582843  0.100886 
    assemble gradient       15.762068     15.762068     15.762068  2.728311 
vector H1                    0.000021      0.000021      0.000021  0.000004 
  dimension 2, order 1       0.142469      0.142469      0.142469  0.024661 
    residual evaluation      0.021928      0.021928      0.021928  0.003796 
    compute gradient         0.063093      0.063093      0.063093  0.010921 
    apply gradient           0.018802      0.018802      0.018802  0.003255 
    assemble gradient        0.115322      0.115322      0.115322  0.019961 
  dimension 2, order 2       1.265889      1.265889      1.265889  0.219117 
    residual evaluation      0.052449      0.052449      0.052449  0.009079 
    compute gradient         0.146337      0.146337      0.146337  0.025330 
    apply gradient           0.045144      0.045144      0.045144  0.007814 
    assemble gradient        0.723344      0.723344      0.723344  0.125206 
  dimension 3, order 1      10.412975     10.412975     10.412975  1.802418 
    residual evaluation      0.717534      0.717534      0.717534  0.124200 
    compute gradient         2.892620      2.892620      2.892620  0.500694 
    apply gradient           0.703525      0.703525      0.703525  0.121776 
    assemble gradient       10.803800     10.803800     10.803800  1.870067 
  dimension 3, order 2     288.591138    288.591138    288.591138 49.953246 
    residual evaluation      2.526766      2.526766      2.526766  0.437367 
    compute gradient        10.805137     10.805137     10.805137  1.870299 
    apply gradient           2.470287      2.470287      2.470287  0.427591 
    assemble gradient      199.653976    199.653976    199.653976 34.558803 

feature/bowen/structs:

Path                    Min time/rank Max time/rank Avg time/rank Time %    
scalar H1                    0.000019      0.000019      0.000019  0.000003 
  dimension 2, order 1       0.185780      0.185780      0.185780  0.031969 
    residual evaluation      0.012261      0.012261      0.012261  0.002110 
    compute gradient         0.022789      0.022789      0.022789  0.003922 
    apply gradient           0.007958      0.007958      0.007958  0.001369 
    assemble gradient        0.030593      0.030593      0.030593  0.005264 
  dimension 2, order 2       0.264404      0.264404      0.264404  0.045498 
    residual evaluation      0.028888      0.028888      0.028888  0.004971 
    compute gradient         0.052706      0.052706      0.052706  0.009070 
    apply gradient           0.019262      0.019262      0.019262  0.003315 
    assemble gradient        0.175531      0.175531      0.175531  0.030205 
  dimension 3, order 1       1.296115      1.296115      1.296115  0.223034 
    residual evaluation      0.317541      0.317541      0.317541  0.054642 
    compute gradient         0.642507      0.642507      0.642507  0.110562 
    apply gradient           0.158886      0.158886      0.158886  0.027341 
    assemble gradient        1.140091      1.140091      1.140091  0.196186 
  dimension 3, order 2      21.562900     21.562900     21.562900  3.710520 
    residual evaluation      1.107753      1.107753      1.107753  0.190621 
    compute gradient         2.238823      2.238823      2.238823  0.385254 
    apply gradient           0.592176      0.592176      0.592176  0.101901 
    assemble gradient       16.256250     16.256250     16.256250  2.797358 
vector H1                    0.000019      0.000019      0.000019  0.000003 
  dimension 2, order 1       0.140954      0.140954      0.140954  0.024255 
    residual evaluation      0.021928      0.021928      0.021928  0.003773 
    compute gradient         0.063095      0.063095      0.063095  0.010857 
    apply gradient           0.018991      0.018991      0.018991  0.003268 
    assemble gradient        0.117446      0.117446      0.117446  0.020210 
  dimension 2, order 2       1.232524      1.232524      1.232524  0.212091 
    residual evaluation      0.051013      0.051013      0.051013  0.008778 
    compute gradient         0.145052      0.145052      0.145052  0.024960 
    apply gradient           0.045237      0.045237      0.045237  0.007784 
    assemble gradient        0.721032      0.721032      0.721032  0.124074 
  dimension 3, order 1      10.397752     10.397752     10.397752  1.789234 
    residual evaluation      0.726283      0.726283      0.726283  0.124978 
    compute gradient         2.995355      2.995355      2.995355  0.515437 
    apply gradient           0.684573      0.684573      0.684573  0.117801 
    assemble gradient       10.759378     10.759378     10.759378  1.851462 
  dimension 3, order 2     288.829889    288.829889    288.829889 49.701534 
    residual evaluation      2.595823      2.595823      2.595823  0.446686 
    compute gradient        11.061210     11.061210     11.061210  1.903401 
    apply gradient           2.455172      2.455172      2.455172  0.422483 
    assemble gradient      201.952514    201.952514    201.952514 34.751770 
samuelpmish commented 3 months ago

But surprisingly the benchmark_functional showed a large speed-up compared to develop, specifically in the assemble gradient section.

It appears that this PR reduced the level of refinement in the benchmark_functional performance test, so develop takes longer because it's just running a bigger problem. We need to be diligent about not changing benchmark problem parameters if we want to be able to track performance metrics over time.

white238 commented 3 months ago

But surprisingly the benchmark_functional showed a large speed-up compared to develop, specifically in the assemble gradient section.

It appears that this PR reduced the level of refinement in the benchmark_functional performance test, so develop takes longer because it's just running a bigger problem. We need to be diligent about not changing benchmark problem parameters if we want to be able to track performance metrics over time.

Ah yes, I remember seeing that before and also makes way more sense why this showed such a large change.

I'll put that back locally and run the benchmark again. Would you prefer that change be undone?

white238 commented 3 months ago

Timing mystery solved! It is back to not having any difference as expected. Thanks @samuelpmishLLNL !

Updated comment above to be the new timing.