Closed spdomin closed 7 years ago
@srajama1 I am working on obtaining a patch for the new ATDM-based search. Once I have that, you can help out on establishing search efficiencies.
Thanks, I was talking to Nate about it. Let me know how I can help.
@mbarone81 let's add your overset work to this as well with the hope that this milestone will define the path forward for blade motion.
@alanw0, take a look at commit dbd1b958a52f82b0d3209ccb4b4d7c621016e62d for a new test to start profiling for NonConformalManager ghosting costs. This should replace the effort on edgeContact3D.
ok got it, I'll take a look at the dgNonConformalEdgeCylinder test.
@srajama1, could you please keep track of the ATDM-based search and test it once it is confirmed that point/box has been deployed? I need to start working on the kokkos algorithm structure task. Thanks.
@NaluCFD/sliding I have the higher order DG scheme working. It also naturally allows for the P=1/P=2 interface. I will perform some more P=2 sliding mesh sims and commit soon.
Hex8/Hex27 or Hex27/Hex27 is now completed: commit d2adbe82d786c3ac8fa9221730788923a6a184f9
@NaluCFD/sliding, here is a sample timing for a 150 million element 1024 job (run 100 steps with two Picard loops).
32 node (36 core per node):
*******************************************************
Simulation Shall Complete: time/timestep: 0.0102493/100
*******************************************************
--------------------------------
Begin Timer Overview for Realm: realm_1
--------------------------------
Timing for Eq: myLowMach
init -- avg: 0.000184042 min: 4.22001e-05 max: 0.00332427
assemble -- avg: 0 min: 0 max: 0
load_complete -- avg: 0 min: 0 max: 0
solve -- avg: 0 min: 0 max: 0
precond setup -- avg: 0 min: 0 max: 0
misc -- avg: 18.7294 min: 15.4274 max: 19.9338
Timing for Eq: MomentumEQS
init -- avg: 431.72 min: 428.433 max: 451.232
assemble -- avg: 482.962 min: 465.899 max: 576.557
load_complete -- avg: 123.853 min: 28.7052 max: 133.94
solve -- avg: 583.758 min: 583.597 max: 589.672
precond setup -- avg: 0.0177482 min: 0.011241 max: 0.0544529
misc -- avg: 67.2906 min: 66.379 max: 68.3661
linear iterations -- avg: 11.79 min: 7 max: 34
Timing for Eq: ContinuityEQS
init -- avg: 201.045 min: 183.683 max: 204.34
assemble -- avg: 151.122 min: 142.574 max: 177.107
load_complete -- avg: 30.9594 min: 4.86764 max: 33.8108
solve -- avg: 3142.5 min: 3142.45 max: 3155.93
precond setup -- avg: 22.3307 min: 22.329 max: 22.3356
misc -- avg: 97.5644 min: 83.7688 max: 98.3748
linear iterations -- avg: 38.11 min: 27 max: 50
Timing for Eq: myZ
init -- avg: 190.661 min: 190.292 max: 191.43
assemble -- avg: 179.576 min: 160.99 max: 204.665
load_complete -- avg: 30.0869 min: 4.99243 max: 32.84
solve -- avg: 58.9669 min: 58.8958 max: 68.3101
precond setup -- avg: 0.00417599 min: 0.00237584 max: 0.0218868
misc -- avg: 18.8389 min: 18.4308 max: 19.505
linear iterations -- avg: 8.28 min: 6 max: 10
Timing for IO:
io create mesh -- avg: 0.363296 min: 0.191619 max: 0.527161
io output fields -- avg: 57.5503 min: 56.8373 max: 58.4367
io populate mesh -- avg: 4.6819 min: 4.6608 max: 4.70148
io populate fd -- avg: 0.256733 min: 0.0831389 max: 0.430451
Timing for connectivity/finalize lysys:
eqs init -- avg: 823.427 min: 820.799 max: 827.33
Timing for property evaluation:
props -- avg: 0.0918778 min: 0.0545573 max: 0.310776
Timing for Contact:
contact bc -- avg: 15.1264 min: 14.6959 max: 18.4114
Timing for Simulation: nprocs= 1152
main() -- avg: 5880.26 min: 5840.17 max: 5887.04
Memory Overview:
nalu memory: total (over all cores) current/high-water mark= 513.083 G 536.876 G
nalu memory: min (over all cores) current/high-water mark= 256.641 M 266.148 M
nalu memory: max (over all cores) current/high-water mark= 1.89328 G 2.04586 G
Min High-water memory usage 266.1 MB
Avg High-water memory usage 477.2 MB
Max High-water memory usage 2095.0 MB
Min Available memory per processor 1789.2 MB
Avg Available memory per processor 1789.2 MB
Max Available memory per processor 1789.2 MB
Min No-output time 5787.6 sec
Avg No-output time 5829.7 sec
Max No-output time 5833.2 sec
STKPERF: Total Time: 5841.7
STKPERF: Current memory: 357113856 (340.6 M)
STKPERF: Memory high water: 374874112 (357.5 M)
64 node (36 core per node):
*******************************************************
Simulation Shall Complete: time/timestep: 0.0102493/100
*******************************************************
--------------------------------
Begin Timer Overview for Realm: realm_1
--------------------------------
Timing for Eq: myLowMach
init -- avg: 9.29431e-05 min: 3.31402e-05 max: 0.000857592
assemble -- avg: 0 min: 0 max: 0
load_complete -- avg: 0 min: 0 max: 0
solve -- avg: 0 min: 0 max: 0
precond setup -- avg: 0 min: 0 max: 0
misc -- avg: 10.332 min: 7.72162 max: 11.2043
Timing for Eq: MomentumEQS
init -- avg: 239.982 min: 237.399 max: 253.78
assemble -- avg: 240.033 min: 231.406 max: 314.129
load_complete -- avg: 97.0818 min: 21.3585 max: 102.162
solve -- avg: 330.231 min: 330.093 max: 330.599
precond setup -- avg: 0.00849527 min: 0.00510311 max: 0.0406508
misc -- avg: 34.181 min: 33.6794 max: 34.9966
linear iterations -- avg: 12.285 min: 7 max: 34
Timing for Eq: ContinuityEQS
init -- avg: 119.214 min: 106.893 max: 121.829
assemble -- avg: 72.407 min: 70.6553 max: 93.4898
load_complete -- avg: 24.731 min: 3.5701 max: 26.1621
solve -- avg: 1910.76 min: 1910.73 max: 1911.08
precond setup -- avg: 12.9936 min: 12.9926 max: 12.9988
misc -- avg: 44.6586 min: 44.1545 max: 45.364
linear iterations -- avg: 42.01 min: 32 max: 50
Timing for Eq: myZ
init -- avg: 108.232 min: 107.934 max: 108.653
assemble -- avg: 81.3621 min: 79.6523 max: 101.46
load_complete -- avg: 23.6941 min: 3.58118 max: 25.093
solve -- avg: 35.8702 min: 35.8191 max: 36.088
precond setup -- avg: 0.00200497 min: 0.00114703 max: 0.0113389
misc -- avg: 9.74541 min: 9.52759 max: 10.3505
linear iterations -- avg: 9.445 min: 6 max: 10
Timing for IO:
io create mesh -- avg: 0.748922 min: 0.388414 max: 0.995598
io output fields -- avg: 26.2067 min: 25.7432 max: 26.8175
io populate mesh -- avg: 4.66858 min: 4.6314 max: 4.70515
io populate fd -- avg: 0.406266 min: 0.152544 max: 0.774392
Timing for connectivity/finalize lysys:
eqs init -- avg: 467.428 min: 465.682 max: 469.464
Timing for property evaluation:
props -- avg: 0.0555738 min: 0.0349991 max: 0.15305
Timing for Contact:
contact bc -- avg: 11.7654 min: 11.5483 max: 14.2255
Timing for Simulation: nprocs= 2304
main() -- avg: 3422.38 min: 3401.18 max: 3425.24
Memory Overview:
nalu memory: total (over all cores) current/high-water mark= 645.294 G 674.343 G
nalu memory: min (over all cores) current/high-water mark= 185.172 M 193.027 M
nalu memory: max (over all cores) current/high-water mark= 1.07852 G 1.14824 G
Min High-water memory usage 193.0 MB
Avg High-water memory usage 299.7 MB
Max High-water memory usage 1175.8 MB
Min Available memory per processor 1789.2 MB
Avg Available memory per processor 1789.2 MB
Max Available memory per processor 1789.2 MB
Min No-output time 3396.1 sec
Avg No-output time 3398.5 sec
Max No-output time 3401.0 sec
STKPERF: Total Time: 3420.3
It's interesting to notice the details of the timings, particularly the difference between min and max for particular lines which indicates imbalance, but it's hard to say whether it's an imbalance of the elements, or work (e.g. localized work like search/contact), or imbalance of ownership of shared nodes which would affect linear-solver work since owned nodes tend to correspond to number of matrix rows per proc.
In these timings the assemble looks pretty well balanced which may indicate the elements are well balanced. The solve time looks balanced but that could be because it includes sync points (like dots/norms) which forces the overall solve time to appear balanced. The load-complete time is distinctly imbalanced, which may be the most direct symptom of an imbalance among shared nodes causing uneven numbers of matrix rows per proc.
Exactly. This is a hybrid mesh. In general, for these types of meshes we find almost perfect elemental balances while the node balance is generally poor. Aero found this as well and changed the manner by which node ownership is processed (round robin rather than lowest rank). We probably can consider something similar to make sure that the rows are well balanced.
Latest push by @alanw0 provides the following differences:
First, the quantity of ghosting has gone down:
Old:
NonConformal alg will ghost a number of entities: 5285506
New:
NonConformal alg will ghost a new number of entities: 1242 and remove 12461 entities from ghosting.
Timing also improved (see push):
https://github.com/NaluCFD/Nalu/commit/4cca5bae07624abdaf356063dd07d301473eecd8
Transition to Jira.
Activities: