Closed jhux2 closed 2 months ago
@alanw0 Unit tests are passing with your fix, thank you.
I have been looking at the unStableEdge runs more closely. On the CrsGraph refactor branch, NaluWind only takes three nonlinear iterations in the first time step, whereas NaluWind master takes four.
Within the first three nonlinear iterations, the linear solver histories are identical between the two code branches. I've also increased the linear solver verbosity so that I can see global matrix statistics. The CrsGraph refactor matrices have more entries, which is expected. MueLu also does the right thing by dropping out the zero entries when coarsening.
All of this leads me to believe the linear systems are being assembled correctly and consistently. But obviously there's a difference.
@rcknaus @sayerhs Any suggestions?
ablUnstableEdge-gold-2timestep-verbose.log.txt ablUnstableEdge-crsgraph-2timestep-verbose.log.txt
@jhux2 It looks like a problem with the "EquationSystem::system_is_converged" check to me. Continuity and enthalpy meet the convergence test after 3 iterations but momentum doesn't, so it should take 4 iterations for the first step. I looked at the code briefly, but didn't have any ideas as to why this would have changed. I'll take another look at it, but it seems related to that
@rcknaus Thanks for the hint. The nonlinear convergence tolerance wasn't being communicated properly. With that and a couple other fixes, the set of failing tests has been narrowed to these:
6: FAIL: airfoilRANSEdgeTrilinos................. 4.4961s 1.4686e-01 2.2813e-07
22: FAIL: ductWedge............................... 0.6109s 5.2951e-11 4.2101e-07
23: FAIL: edgeHybridFluids........................ 34.2733s 3.1472e-14 3.5561e-11
24: FAIL: edgePipeCHT............................. 4.9603s 1.8287e-12 2.3133e-12
25: FAIL: ekmanSpiral............................. 11.2711s 9.8148e-07 8.9743e-05
43: FAIL: karmanVortex............................ 0.4997s 2.5751e-08 9.4357e-07
50: FAIL: nonIsoEdgeOpenJet....................... 12.9589s 5.6479e-11 1.5776e-07
53: FAIL: nonIsoNonUniformEdgeOpenJet............. 34.3867s 4.0505e-08 2.2234e-06
Those diffs look typical of order-of-operations changes to me.
@rcknaus Thanks. I will wait to hear from @sayerhs before I merge.
I noticed a performance regression related to this with some of the "edge-based" tests, running on my laptop. it looks like it's present on the dashboard too,
https://my.cdash.org/testDetails.php?test=56232170&build=1802978 [display graphs -> test time]
It looks like ~50% for some of the simpler edge based tests.
@rcknaus Yes, I see it in my local builds, as well. It's not algorithmic -- before/after the PR merge, all linear iterations are identical for ablNeutralEdge. The performance difference is evident even in the matrix-vector product:
Before CrsGraph PR
Timer Name MinOverProcs MeanOverProcs MaxOverProcs
Belos: Operation Op*x 2.7279 (1243) 2.8273 (1243) 2.9788 (1243)
After CrsGraph PR
Belos: Operation Op*x 7.6261 (1243) 8.1160 (1243) 8.3396 (1243)
I also see a difference in the ablNeutralEdgeSegregated. Before CrsGraph PR
Belos: Operation Op*x 1.4314 (1638) 1.4610 (1638) 1.4943 (1638)
After CrsGraph PR
Belos: Operation Op*x 3.7999 (1638) 3.9999 (1638) 4.1238 (1638)
@jhux2 Interesting. So the larger graph with the extra zeros is having a performance impact? Shall we revert this? Since this will have a big impact on production runs.
I wonder if we should consider only doing edges and not full element for volume? And if that'll restore the performance somewhat. We will still need to do the full face elem graph, but hopefully that's a much smaller overhead.
Yes, I agree this should be reverted until the performance issues can be resolved.
I wonder if we should consider only doing edges and not full element for volume? And if that'll restore the performance somewhat. We will still need to do the full face elem graph, but hopefully that's a much smaller overhead.
So this would be require separate code paths that depend on the discretization, as well as maintaining an additional graph for the element graph?
@jhux2 if the realm.realmUsesEdges_ == true
then we activate buildEdgeToNodeGraph
and not buildElemToNodeGraph
. I will need to check, but I am fairly certain there are no use cases where the user sets use_edges: yes
in input file and still ends up needing an element based graph.
We just mix and match on the boundaries for the graph, so I think just the buildFaceElem
connections should be needed for use_edges: yes
.
We just mix and match on the boundaries for the most part for the graph, so I think just the
buildFaceElem
connections should be needed foruse_edges: yes
.
I guess this is what you were saying in this comment.
The various linear system construction phases each create a matrix graph. This is redundant work, as almost all phases use the same connectivity graph. One exception is the "monolithic" momentum linear system.
This issue will track progress in factoring out the graph construction so that it is done as few times as possible per time step.