Closed jedbrown closed 2 years ago
I'm confused -- in lines 1088-1092 of ceed-cuda-operator.c
, in the functionCeedOperatorAssembleDiagonalCore_Cuda
, it checks whether the diagonal operator has already been built, and only calls the setup function (where the compilation occurs) if it hasn't:
// Setup
if (!impl->diag) {
ierr = CeedOperatorAssembleDiagonalSetup_Cuda(op, pointBlock);
CeedChkBackend(ierr);
}
The first issue is a few lines above, namely the CeedOperatorLinearAssembleQFunction
just above that guard.
static inline int CeedOperatorAssembleDiagonalCore_Cuda(CeedOperator op,
CeedVector assembled, CeedRequest *request, const bool pointBlock) {
int ierr;
Ceed ceed;
ierr = CeedOperatorGetCeed(op, &ceed); CeedChkBackend(ierr);
CeedOperator_Cuda *impl;
ierr = CeedOperatorGetData(op, &impl); CeedChkBackend(ierr);
// Assemble QFunction
CeedVector assembledqf;
CeedElemRestriction rstr;
ierr = CeedOperatorLinearAssembleQFunction(op, &assembledqf, &rstr, request);
CeedChkBackend(ierr);
ierr = CeedElemRestrictionDestroy(&rstr); CeedChkBackend(ierr);
CeedScalar maxnorm = 0;
ierr = CeedVectorNorm(assembledqf, CEED_NORM_MAX, &maxnorm);
CeedChkBackend(ierr);
// Setup
if (!impl->diag) {
ierr = CeedOperatorAssembleDiagonalSetup_Cuda(op, pointBlock);
CeedChkBackend(ierr);
}
Oh, I see, that explains the element restriction stuff showing up in the compilation call as well.
With #811 is there more we want to do here?
Are there currently any kernel rebuilds within the fluids time loop or solids load increments (or CUDA or ROCm)? We can close this once we've checked that they're gone.
In solids/Ratel we're good. I can check for fluids.
Good on fluids:
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
0 TS dt 1e-07 time 0.
1 TS dt 1e-06 time 1e-07
2 TS dt 1e-05 time 1.1e-06
3 TS dt 3.61077e-05 time 1.11e-05
4 TS dt 3.62582e-05 time 4.72077e-05
5 TS dt 3.68712e-05 time 8.34659e-05
6 TS dt 3.74745e-05 time 0.000120337
7 TS dt 3.80842e-05 time 0.000157812
8 TS dt 3.87062e-05 time 0.000195896
9 TS dt 3.93464e-05 time 0.000234602
10 TS dt 4.00104e-05 time 0.000273948
good on Ratel
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
0 TS dt 0.1 time 0.
0 SNES Function norm 1.319444444444e-02
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
compiling
1 SNES Function norm 3.053486454374e-02
2 SNES Function norm 1.043668772223e-03
3 SNES Function norm 1.717678106611e-05
4 SNES Function norm 1.614817227388e-10
5 SNES Function norm 3.769096552398e-16
1 TS dt 0.1 time 0.1
0 SNES Function norm 1.319444444444e-02
1 SNES Function norm 3.036654411669e-02
2 SNES Function norm 1.057633854960e-03
3 SNES Function norm 1.893185192170e-05
4 SNES Function norm 1.648310078930e-10
5 SNES Function norm 5.775915382844e-16
2 TS dt 0.1 time 0.2
0 SNES Function norm 1.319444444444e-02
1 SNES Function norm 3.000629493173e-02
2 SNES Function norm 1.056396162693e-03
3 SNES Function norm 1.929385269493e-05
4 SNES Function norm 1.672885067132e-10
5 SNES Function norm 9.717309069193e-16
3 TS dt 0.1 time 0.3
0 SNES Function norm 1.319444444444e-02
1 SNES Function norm 2.919679969063e-02
2 SNES Function norm 1.006984843377e-03
3 SNES Function norm 1.689161370771e-05
4 SNES Function norm 1.351257433559e-10
5 SNES Function norm 1.238094526520e-15
4 TS dt 0.1 time 0.4
0 SNES Function norm 1.319444444444e-02
1 SNES Function norm 2.775711886243e-02
2 SNES Function norm 8.863569803524e-04
3 SNES Function norm 1.213544060421e-05
4 SNES Function norm 7.372986610904e-11
5 TS dt 0.1 time 0.5
0 SNES Function norm 1.319444446375e-02
1 SNES Function norm 2.565751791823e-02
2 SNES Function norm 6.993092433606e-04
3 SNES Function norm 7.022824735838e-06
4 SNES Function norm 2.551068010392e-11
6 TS dt 0.1 time 0.6
0 SNES Function norm 1.319444445258e-02
1 SNES Function norm 2.301164630316e-02
2 SNES Function norm 4.827125094351e-04
3 SNES Function norm 3.313828245655e-06
4 SNES Function norm 5.467409174031e-12
7 TS dt 0.1 time 0.7
0 SNES Function norm 1.319444444672e-02
1 SNES Function norm 2.004140370955e-02
2 SNES Function norm 2.882154092733e-04
3 SNES Function norm 1.311231278011e-06
4 SNES Function norm 7.112126786343e-13
8 TS dt 0.1 time 0.8
0 SNES Function norm 1.319444444482e-02
1 SNES Function norm 1.701238222008e-02
2 SNES Function norm 1.493504680447e-04
3 SNES Function norm 4.393354545156e-07
4 SNES Function norm 5.644082863586e-14
9 TS dt 0.1 time 0.9
0 SNES Function norm 1.319444444448e-02
1 SNES Function norm 1.414861611133e-02
2 SNES Function norm 6.809871803714e-05
3 SNES Function norm 1.204273119829e-07
4 SNES Function norm 3.239667923424e-15
10 TS dt 0.1 time 1.
0 SNES Function norm 1.319444444445e-02
1 SNES Function norm 1.158419360419e-02
2 SNES Function norm 2.870337960955e-05
3 SNES Function norm 2.439533010569e-08
4 SNES Function norm 3.009243269173e-15
11 TS dt 0.1 time 1.1
Great, thanks!
Notably, grid transfer operators and diagonal assembly. Running (for example) the solid mechanics example with a breakpoint in
CeedCompileCuda
and collecting stack traces is informative. For example, this occurs on every level and every Newton step.We need to audit this and persist kernels as appropriate so that we only need to compile specialized kernels once provided the shapes don't change.