Closed jhux2 closed 2 months ago
@sayerhs @lucbv @srajama1 @alanw0
Here a sample trace of the graph initialization calls from ablNeutralEdgeSegregated:
[EDIT: I forgot that the TpetraSegregatedLinearSystem class (TSLS) called through to different underlying graph methods. I've updated the call stack -- differences still remain.]
EquationSystems::initialize(): Begin
EQUATION LowMachEOSWrap initialize()
EQUATION MomentumEQS initialize()
TSLS::buildFaceElemToNodeGraph
CrsGraph::buildFaceElemToNodeGraph
TSLS::buildFaceToNodeGraph
CrsGraph::buildConnectedNodeGraph
TSLS::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
TSLS::buildFacetoNodeGraph
CrsGraph::buildNodeGraph
TSLS::buildFacetoNodeGraph
CrsGraph::buildNodeGraph
EQUATION ContinuityEQS initialize()
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
EQUATION EnthalpyEQS initialize()
CrsGraph::buildFaceToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildFaceToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
CrsGraph::buildNodeGraph
EQUATION TurbKineticEnergyEQS initialize()
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
EquationSystems::initialize(): End
Working under the assumption that the segregated momentum and continuity have the same matrix graph structure, I was expecting that the Momentum and Continuity phases should have the same call stack.
@sayerhs @alanw0 Is this expected behavior?
@jhux2 buildConnectedNodeGraph
is the actual function that is called by specific graph building methods
In the ABL problem we do have different BCs applied to different equation systems, that is why the graph calls look different. However, if you look at the final resulting graph I believe they will have the same non-zero columns for each row.
@sayerhs Apart from buildConnectedNodeGraph
, are all the other calls for handling boundary conditions?
In general, is there a way of determining during runtime which physics systems will have the same graph structure, or is this something that can only be known by someone who knows the physics of the simulation?
Status update
I fixed what ended up being a simple bug in TpetraSegregatedLinearSystem that was causing seg faults and/or bad convergence. Running ablNeutralEdgeSegregated in two different ways yields:
use_edges: no
: continuity and scalar momentum matrix graphs are identical
use_edges:yes
: scalar momentum matrix graph has more entries than continuity graph
Afaik, figuring out why the graphs are different for the yes
case and fixing that is the last hurdle to being able to use a single graph.
It looks like the momentum wall function implementation is using the full face element connectivity, more than what continuity uses at the wall. We could maybe pad the connectivity around the boundaries to always have the full face connectivity, or we could drop connections for the wall function
@rcknaus thanks for the explanation. Would it be possible to run all the boundary condition algorithms (for momentum, continuity, etc...) on the "master graph" so that we have all connectivity in place. We could then use the master graph for all the matrices in the system and simply put default zeroes in entries that are not needed say in the continuity matrix in the case of ablNeutralEdge?
Yeah, I think that would be a good solution.
@lucbv If we revert to a single graph, won't all initialize_connectivity
go through the same graph creation logic and end up the way you describe? @jhux2 had expressed some performance concerns with this approach. I suspect it is not that much but good to test that out.
@lucbv wrote:
Would it be possible to run all the boundary condition algorithms (for momentum, continuity, etc...) on the "master graph" so that we have all connectivity in place. We could then use the master graph for all the matrices in the system and simply put default zeroes in entries that are not needed say in the continuity matrix in the case of ablNeutralEdge?
@sayerhs suggested something along these lines.
It looks like the momentum wall function implementation is using the full face element connectivity, more than what continuity uses at the wall. We could maybe pad the connectivity around the boundaries to always have the full face connectivity, or we could drop connections for the wall function
@rcknaus I looked in LowMachEquationSystem.C
to see if there was an obvious way of changing which graph is created, but I don't understand the logic well enough. I'm guessing that initialize_connectivity() needs to call a different graph setup, but what else has to happen?
@jhux2 initialize_connectivity
is called in https://github.com/Exawind/nalu-wind/blob/73cb4b961434f552b52ccd037281406ca0e247ec/src/SolverAlgorithmDriver.C#L70
which then accesses linsys_
for the corresponding equation system https://github.com/Exawind/nalu-wind/blob/master/src/LowMachEquationSystem.C#L1016
and then calls the build*Graph
methods. The first step would be to separate out graph from linear system (or have the ability for LinearSystem::create
to take a graph
argument and use that instead of creating a graph inside. If LinearSystem
can be made to have the same graph instance, then the equation systems are oblivious to it.
For the ABL neutral edge regression tests, based on the physics all I can say is that the momentum graph is a superset of pressure graph.
@sayerhs Actually, I've already factored out the crsgraph from LinearSystem. See my branch https://github.com/jhux2/nalu-wind/tree/crsgraph-refactor.
My question is what needs to change in addition to creating a graph with more connectivity for the physics phases other than momentum.
Would it be helpful to create a PR at this point?
@jhux2 I am not sure I understand your question. If every linsys_
instance has the same graph instance, then once all the equation systems call initialize()
method, the resultant graph will have the connectivity of all the physics involved across all equation systems. There is nothing else that needs to be done to the rest of the code.
Are you asking about the overhead of build*Graph
calls that are repeated in each equation system that will add duplicate connections? If so, that is a bit tricky because of several combinations of if-conditions
that nalu supports currently. But for certain conditions (e.g., use_edges: True
) we only need to do buildEdgeToNodeGraph
and buildElemToNodeGraph
etc once as they are on the same mesh parts. The only ones we will need to repeat over all equation systems is the boundary conditions, but since they are faces they should be of smaller impact at least for the first proof of concept.
I think a PR would be useful to look at your code and discuss.
I am not sure I understand your question. If every
linsys_
instance has the same graph instance, then once all the equation systems callinitialize()
method, the resultant graph will have the connectivity of all the physics involved across all equation systems. There is nothing else that needs to be done to the rest of the code.
The current state of my PR is that there is now a separate CrsGraph class. In each physics phase, the linear system construction first creates a CrsGraph, then creates a matrix.
What I've observed is that the graph sparsity varies for the segregated momentum, depending on whether use_edges
is true or false.
Before having all physics phases depend on a single CrsGraph (which is the next logical step), I'd like to get the code in a state where the segregated momentum graph has the same sparsity as continuity, regardless of the value of use_edges
. My question is whether this is just a matter of calling the correct method CrsGraph::build*Graph
, or if there's something with BCs that needs to change.
@jhux2 OK. Would it not be possible by simply changing the boundary conditions in the input file to be dirichlet instead of wall function? Can you explain your motivation for changing the code?
If you want to do change the code, then you can capture the parts in register_*_bc
calls within the equation systems and then manually call linsys_->buildFaceToNodeGraph
on those part vectors.
Finally, when use_edges: yes
, the current nalu-wind implementation adds extra connections to the graph even though it is not used at all. Those entries just end up as zeros in momentum system.
@sayerhs My goal is just to get the momentum graph to be the same, regardless of use_edges
. If this can be done via the input deck and without changing code, then I'm a happy guy.
Confirmed that graphs match for the following when activating the Tpetra segregated momentum solve:
periodic3dEdge
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 ContinuityEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 ContinuityEQS-O-1.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 EnthalpyEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 EnthalpyEQS-O-1.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 MomentumEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 MomentumEQS-O-1.gra.1.0
periodic3dElem
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 ContinuityEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 ContinuityEQS-O-1.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 EnthalpyEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 EnthalpyEQS-O-1.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 MomentumEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 MomentumEQS-O-1.gra.1.0
Graphs also match between different phyics for the airfoilRANSEdge
and airfoilRANSElem
, respectively, when using the segregated momentum solver.
@jhux2 For testing purposes, one solution would be https://github.com/rcknaus/nalu-wind/commit/fb984a8fb81472c2c0ef8578ee6be8a5733f3718 , where I just force all boundary conditions to use the same stencil regardless of whether it's necessary or not.
Which graph methods are called in initialize_connectivity()
depends on the type of SolverAlgorithm
s instantiated in the register_*_algorithm
sections. For the abl neutral case, there's a "face elem" algorithm (buildFaceElemToNodeGraph
) at the "top" bc and a "face" algorithm (buildFaceToNodeGraph
) at the bottom. For continuity, it's a no-op for both of those condtions.
Would it be possible to run all the boundary condition algorithms (for momentum, continuity, etc...) on the "master graph" so that we have all connectivity in place. We could then use the master graph for all the matrices in the system and simply put default zeroes in entries that are not needed say in the continuity matrix in the case of ablNeutralEdge?
I've returned to working on this. It appears that we'll need something like @lucbv suggested. Different physics are creating different stencils. For example, below is the ablNeutralEdge regression test, where one can see different graph calls for the different physics, and reusing the graph between continuity and enthalpy fails on 8 MPI ranks.
Question: Would it be reasonable to do what @rcknaus said he does for testing, i.e., force all BCs to use the same stencil?
EQUATION MomentumEQS
CrsGraph::buildFaceElemToNodeGraph
CrsGraph::buildFaceToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
CrsGraph::buildNodeGraph
EQUATION ContinuityEQS
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
EQUATION EnthalpyEQS
CrsGraph::buildFaceToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildFaceToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
CrsGraph::buildNodeGraph
EQUATION TurbKineticEnergyEQS
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
TLS::finalizeLinearSystem
EquationSystems::initialize(): End
@jhux2 if you call buildElemToNodeGraph
for interior and buildFaceElemToNodeGraph
for boundaries, all these should collapse into the same graph.
However, can you explain what you mean by continuity/enthalpy failing? Do you mean you only run the build*
methods once for Continuity and then reuse that for Enthalpy without letting Enthalpy call the build*
methods too?
@sayerhs would it then make sense to only use these two graphs and remove the others? It would simplify the code base and probably would not cost much more computationally.
Jonathan is trying to build the CrsGraph once per time step, the different equations in the system would then share the same graph
However, can you explain what you mean by continuity/enthalpy failing? Do you mean you only run the
build*
methods once for Continuity and then reuse that for Enthalpy without letting Enthalpy call thebuild*
methods too?
If I allow Continuity and Enthalpy to each build their own CrsGraph, the test passes. If Enthalpy uses the graph that Continuity built, the test fails.
@jhux2 Thanks, I think the solution would then be to use buildElemToNodeGraph
and buildFaceElemToNodeGraph
for all use cases and then we will have the full connectivity for all the different possible BCs across all equation systems. I think this is along the lines of what @rcknaus suggested in a previous post.
The question for you and @lucbv is: what happens if the stencil for a row has 27 entries but only 8 are non-zero but the rest are all zeros. Would there be a performance impact? I suspect the savings from graph reinit will more than make up for this, but thought I'd ask.
@sayerhs usually I am happy to trade a little more flop if I can save some communication. In this case the overall increase of entries in the matrix is modest enough at large scale that it would not be very noticeable. On smaller meshes where the extrior/interior ratio gets high you could argue that less entries is better, but that's not the regime we are targeting.
By the way if you call buildElemToNodeGraph
and buildFaceElemToNodeGraph
you do end up with the extra zero entries anyway right?
Thanks, I think the solution would then be to use
buildElemToNodeGraph
andbuildFaceElemToNodeGraph
for all use cases and then we will have the full connectivity for all the different possible BCs across all equation systems. I think this is along the lines of what @rcknaus suggested in a previous post.
@sayerhs Thank you. Would it be sufficient to use @rcknaus's patch, https://github.com/rcknaus/nalu-wind/commit/fb984a8fb81472c2c0ef8578ee6be8a5733f3718?
@lucbv Yeah, that's why I was asking about the potential performance impacts. If we replace buildEdgeToNodeGraph
with buildElemToNodeGraph
that will increase the stencil for an internal DOF from 7 to 27 but most of those will be zeros when using edge algorithms. Similarly, FaceElemToNode
will add connectivity to internal nodes when processing boundary faces which are not used for several of the BCs.
That patch is a quick way to test this, but think it would be cleaner to replace the implementation so buildEdgeToNodeGraph
etc. in the CrsGraph
class. It doesn't make sense to loop over the mesh and add those other connections via SolverAlgorithm::initialize_connectivity()
if we are going to call buildElemToNodeGraph
anyway.
As suggested, I've added calls to buildElemToNodeGraph
and buildFaceElemToNodeGraph
to certain calls to initialize_connectivity
. I found that I could not do this everywhere. Adding to the method in AssembleContinuityNonConformalSolverAlgorithm
caused numerous failures in the dgNonConformal*
tests, and adding to AssembleEdgeSolverAlgorithm
caused failures in ekmanSpiral
and fluidsPmrChtPeriodic
.
Tests currently failing:
1 - ablNeutralEdge (Failed)
4 - ablUnstableEdge (Failed)
5 - ablUnstableEdge_ra (Failed)
6 - airfoilRANSEdgeTrilinos (Failed)
22 - ductWedge (Failed)
23 - edgeHybridFluids (Failed)
24 - edgePipeCHT (Failed)
43 - karmanVortex (Failed)
50 - nonIsoEdgeOpenJet (Failed)
53 - nonIsoNonUniformEdgeOpenJet (Failed)
I'm looking at the abl*
tests, which have "hard" failures. The others are tolerance differences:
FAIL: airfoilRANSEdgeTrilinos................. 55.1897s 3.0239e-03 4.6973e-09
FAIL: ductWedge............................... 6.1988s 6.0500e-11 1.4597e-06
FAIL: edgeHybridFluids........................ 36.0225s 1.3757e-14 1.2991e-11
FAIL: edgePipeCHT............................. 6.1986s 8.9700e-13 4.1854e-13
FAIL: karmanVortex............................ 2.7645s 1.5869e-08 1.6911e-07
FAIL: nonIsoEdgeOpenJet....................... 13.6240s 2.1511e-11 1.5026e-07
FAIL: nonIsoNonUniformEdgeOpenJet............. 35.9626s 1.4207e-08 7.7987e-07
@jhux2 Sorry, I should have clarified that we need to keep the non-conformal and overset graph additions as they are adding connections to completely arbitrary elements. However, they will be the same for all equation systems. Can you share the output from one of the hard failures? ablNeutralEdge
would be good.
Here are the logs for ablNeutralEdge
.
@jhux2 Are you sure that the log file you posted is from the latest run (corresponding to output on screen.txt
)? It looks like a fully complete run, so thought I'd check. The stacktrace doesn't have any file info, it is difficult to confirm from that.
@sayerhs Yes, ablNeutralEdge.log
ends where Nalu seg faults. Note the lack of timers. The result of the seg fault is captured in screen.txt
.
I'm guessing that the graph for enthalpy should be different from continuity in some way.
I've added some additional runtime printing in the assembly:
EquationSystems::initialize(): Begin
EQUATION LowMachEOSWrap
EQUATION MomentumEQS
CrsGraph::buildFaceElemToNodeGraph
AssembleElemSolverAlgorithm::initialize_connectivity
CrsGraph::buildFaceElemToNodeGraph
CrsGraph::buildElemToNodeGraph
CrsGraph::buildConnectedNodeGraph
AssembleEdgeSolverAlgorithm::initialize_connectivity
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
CrsGraph::buildNodeGraph
EQUATION ContinuityEQS
AssembleEdgeSolverAlgorithm::initialize_connectivity
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
EQUATION EnthalpyEQS
AssembleElemSolverAlgorithm::initialize_connectivity
CrsGraph::buildFaceElemToNodeGraph
CrsGraph::buildElemToNodeGraph
CrsGraph::buildConnectedNodeGraph
AssembleElemSolverAlgorithm::initialize_connectivity
CrsGraph::buildFaceElemToNodeGraph
CrsGraph::buildElemToNodeGraph
CrsGraph::buildConnectedNodeGraph
AssembleEdgeSolverAlgorithm::initialize_connectivity
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
CrsGraph::buildNodeGraph
EQUATION TurbKineticEnergyEQS
AssembleEdgeSolverAlgorithm::initialize_connectivity
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
EquationSystems::initialize(): End
@jhux2 I guess I don't fully understand what you are trying to do. The printout indicates that Enthalpy
is also calling the build*Graph
methods and populating connections, so how would it matter if Continuity is different from Enthalpy?
@jhux2 I added a couple of commits to the branch that addresses some of the issues you are seeing. With this you should not see the segfaults. Main changes
TpetraLinearSystem::build*NodeGraph
calls essentially no-op. I kept the calls through because we need them for Hypre
as well as for the inConstruction_
flag checks. numDof()
. This could be further optimized, but I left that alone for now to keep commits small. @sayerhs Thank you, I pulled in your changes and made a couple more to resolve some hard failures. ctest now shows only diff failures:
22: FAIL: ductWedge............................... 1.4586s 5.2951e-11 4.2101e-07
24: FAIL: edgePipeCHT............................. 5.9859s 1.8287e-12 2.3133e-12
43: FAIL: karmanVortex............................ 2.0890s 2.5751e-08 9.4357e-07
25: FAIL: ekmanSpiral............................. 14.9565s 9.8148e-07 8.9743e-05
6: FAIL: airfoilRANSEdgeTrilinos................. 14.8575s 1.4686e-01 2.2813e-07
50: FAIL: nonIsoEdgeOpenJet....................... 16.7441s 5.6479e-11 1.5776e-07
4: FAIL: ablUnstableEdge......................... 26.7271s 4.9472e-01 6.8726e+01
3: FAIL: ablStableElem........................... 30.0354s 1.3566e-01 6.8919e+01
23: FAIL: edgeHybridFluids........................ 38.5468s 3.1472e-14 3.5561e-11
4: FAIL: ablUnstableEdge_rst..................... 25.9356s 6.0862e-03 3.3690e-04
53: FAIL: nonIsoNonUniformEdgeOpenJet............. 43.0537s 4.0505e-08 2.2234e-06
40: FAIL: hoVortex................................ 65.3133s 6.6308e-06 3.3437e+00
34: FAIL: fluidsPmrChtPeriodic.................... 83.7547s 4.6848e-01 6.3268e-01
unitTest1
66: [==========] 54 tests from 67 test cases ran. (5471 ms total)
66: [ PASSED ] 51 tests.
66: [ FAILED ] tests, listed below:
66: [ FAILED ] MixtureFractionKernelHex8Mesh.NGP_adv_diff_edge_tpetra_fix_pressure_at_node
66: [ FAILED ] MixtureFractionKernelHex8Mesh.NGP_adv_diff_edge_tpetra
66: [ FAILED ] MixtureFractionKernelHex8Mesh.NGP_adv_diff_edge_tpetra_dirichlet
unitTest2
67: [==========] 354 tests from 67 test cases ran. (4428 ms total)
67: [ PASSED ] 351 tests.
67: [ FAILED ] 3 tests, listed below:
67: [ FAILED ] MixtureFractionKernelHex8Mesh.NGP_adv_diff_edge_tpetra
67: [ FAILED ] MixtureFractionKernelHex8Mesh.NGP_adv_diff_edge_tpetra_fix_pressure_at_node
67: [ FAILED ] MixtureFractionKernelHex8Mesh.NGP_adv_diff_edge_tpetra_dirichlet
Can you post the output from one or more of the failed tpetra unit tests?
@jhux2 thanks. It looks like the matrix has been reordered, which may be consistent/expected with your graph changes...
It looks like the matrix has been reordered, which may be consistent/expected with your graph changes...
Well, the matrix graph has been expanded, but I would think that the placement of nonzero entries shouldn't have changed.
@jhux2 The abl{Stable,Unstable}
tests are failing with higher errors that I would have expected. If you could share the log files for those cases, that would be great.
@alanw0 So it looks like we need to update the unit-test to account for more entries in the Tpetra Graph. Would you be able to take a stab at that?
@sayerhs @jhux2 I'll try to make the tpetra unit-test more robust for the case where the graph may have extra entries but the non-zeros are still the same.
logs.tar.txt @sayerhs Here are the log files. They are in a tar file, despite the extension.
@sayerhs If you or @alanw0 need any further information, just let me know. I'm going to refresh my branch and rerun tests.
Hi @jhux2, sorry I haven't gotten to that unit-test yet. My intentions are good, I just haven't gotten to it yet. I'll try to get it done by end of tomorrow.
The various linear system construction phases each create a matrix graph. This is redundant work, as almost all phases use the same connectivity graph. One exception is the "monolithic" momentum linear system.
This issue will track progress in factoring out the graph construction so that it is done as few times as possible per time step.