SCOREC / core

parallel finite element unstructured meshes
Other
181 stars 63 forks source link

large imbalance from zoltan/parmetis on hex + mixed mesh #173

Open cwsmith opened 6 years ago

cwsmith commented 6 years ago

mixed.tar.gz

Partitioning the attached mesh and model with global zoltan/parmetis produces a large entity imbalance. The pumi RIB partitioner produces reasonable imbalances.

Note, this mesh has two geometric model faces that are matched.

$ mpirun -np 16 ./test/split gs_bumDNS.smd z2/ z16/ 8
mesh z2/ loaded in 0.807597 seconds
number of tet 0 hex 30745 prism 75580 pyramid 0
mesh entity counts: v 69660 e 245640 f 282305 r 106325
planned RIB factor 8 in 0.108743 seconds
mesh expanded from 2 to 16 parts in 0.415002 seconds
mesh migrated from 2 to 16 in 4.512924 seconds
PARMA_STATUS  disconnected <max avg> 1 0.062
PARMA_STATUS  neighbors <max avg> 5 3.750
PARMA_STATUS  smallest side of max neighbor part 12
PARMA_STATUS  num parts with max neighbors 4
PARMA_STATUS  empty parts 0
PARMA_STATUS  small neighbor counts 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0
PARMA_STATUS  weighted vtx <tot max min avg> 89112.0 10116.0 3116.0 5569.500
PARMA_STATUS  weighted edge <tot max min avg> 290875.0 30890.0 11378.0 18179.688
PARMA_STATUS  weighted face <tot max min avg> 308104.0 31066.0 12989.0 19256.500
PARMA_STATUS  weighted rgn <tot max min avg> 106325.0 10291.0 4726.0 6645.312
PARMA_STATUS  owned bdry vtx <tot max min avg> 5424 813 0 339.000
PARMA_STATUS  shared bdry vtx <tot max min avg> 10944 1065 332 684.000
PARMA_STATUS  model bdry vtx <tot max min avg> 31553 3496 1104 1972.062
PARMA_STATUS  sharedSidesToElements <max min avg> 0.141 0.055 0.087
PARMA_STATUS  entity imbalance <v e f r>: 1.82 1.70 1.61 1.55
MDS: reordering before writing smb files
mesh z16/ written in 2.934006 seconds

imbalance blows up using zoltan!

$ mpirun -np 16 ./test/zsplit gs_bumDNS.smd z2/ z16/ 8
mesh z2/ loaded in 0.358378 seconds
number of tet 0 hex 30745 prism 75580 pyramid 0
mesh entity counts: v 69660 e 245640 f 282305 r 106325
planned Zoltan split factor 8 to target imbalance 1.050000 in 0.466251 seconds
mesh expanded from 2 to 16 parts in 0.442006 seconds
mesh migrated from 2 to 16 in 4.656009 seconds
PARMA_STATUS  disconnected <max avg> 0 0.000
PARMA_STATUS  neighbors <max avg> 6 4.250
PARMA_STATUS  smallest side of max neighbor part 12
PARMA_STATUS  num parts with max neighbors 1
PARMA_STATUS  empty parts 0
PARMA_STATUS  small neighbor counts 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0
PARMA_STATUS  weighted vtx <tot max min avg> 87668.0 24900.0 2980.0 5479.250
PARMA_STATUS  weighted edge <tot max min avg> 288257.0 72585.0 10898.0 18016.062
PARMA_STATUS  weighted face <tot max min avg> 306930.0 70136.0 12451.0 19183.125
PARMA_STATUS  weighted rgn <tot max min avg> 106325.0 22450.0 4532.0 6645.312
PARMA_STATUS  owned bdry vtx <tot max min avg> 3962 603 0 247.625
PARMA_STATUS  shared bdry vtx <tot max min avg> 8038 906 251 502.375
PARMA_STATUS  model bdry vtx <tot max min avg> 31048 8839 1000 1940.500
PARMA_STATUS  sharedSidesToElements <max min avg> 0.110 0.034 0.074
PARMA_STATUS  entity imbalance <v e f r>: 4.54 4.03 3.66 3.38
MDS: reordering before writing smb files
mesh z16/ written in 3.105811 seconds
KennethEJansen commented 6 years ago

As a further update on this issue we can report the following for two meshes:

In the original mesh there are 152M elements with 19% wedges and other 81% hexes. At 32k the max vertex load imbalance is 15 and the max region load imbalance is 14. The path to this case is /projects/UnsAdaptCFD_tesp/balin/NSF_DataDrivenTurbMod/DNS/Preliminary/Chef/32k-4k-Chef-SymmRANS/266672.out on Theta. Here are the params 18 preAdaptBalanceMethod none 19 prePhastaBalanceMethod parma-gap 20 partitionMethod rib 21 LocalPtn 0

Obviously we cannot run this case with such an imbalance so Riccardo altered the me Simmetrix mesh generator to minimize the number of wedges, reducing them to only 0.05% of the mesh. At 32k parts, the max vertex load imbalance is 1.6 while the max region load imbalance is 1.2. The path to this case is /projects/UnsAdaptCFD_tesp/balin/NSF_DataDrivenTurbMod/DNS/Preliminary/Chef/32k-4k-Chef-Quad/268220.out on Theta. here are the params. 18 preAdaptBalanceMethod none 19 prePhastaBalanceMethod parma-gap 20 partitionMethod rib 21 LocalPtn 0

I know we should expect some imbalance with mixed meshes but, for all intensive purposes, the second is monotopology so I don't think this issue can be blamed on mixed meshes, rather it seems to be an issue with hexes.

KennethEJansen commented 6 years ago

I asked Riccardo to try zrib. Here is is his report:

partitionMethod=zrib and prePhastaBalanceMethod=zrib does worse than partitionMethod=rib and prePhastaBalanceMethod=graph. Former gives: max vertex load imbalance of partitioned mesh = 1.797214 ratio of sum of all vertices to sum of owned vertices = 1.258720 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.233574

Latter gives: max vertex load imbalance of partitioned mesh = 1.610510 ratio of sum of all vertices to sum of owned vertices = 1.244223 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.209429

Do you think we should report this to the Zoltan team?

Is there a way within Chef to tell it to ignore matching (i.e., not to worry about preserving matching information) to determine if it is this feature that is causing the imbalance?

Obviously we could regenerate the mesh without matching (I think so at least, not sure since it is generated by extrusion) but this would be a lot of work and I am hoping there is a flag that can be turned on instead.

cwsmith commented 6 years ago

Our Zoltan load balancer interface will create graph edges between matched faces:

https://github.com/SCOREC/core/blob/eccd2c3c87011c9af338dd2209291404f2b09df0/zoltan/apfZoltan.h#L100-L101

The Zoltan and PUMI RIB balancers don't transform coordinates to account for matching though.

Using the ptnParma tool you can balance with Zoltan RCB and experiment with other methods: https://github.com/SCOREC/core/blob/eccd2c3c87011c9af338dd2209291404f2b09df0/test/ptnParma.cc#L56-L68

Note, this tool runs parma after the splitter and may be very slow given the large imbalances involved:

mpirun -n 16 ./test/ptnParma gs_bumDNS.dmg bumDNS-Smaller.smb zrcb_16/ 16 rcb ptn 0

Applying the attached patch with git am disableZtnMatch.patch disables disableZtnMatch.patch.gz Zoltan matching support during graph construction.

The matched and unmatched 1->16 part Zoltan (calling parmetis) partitions are identical on the test mesh uploaded to this issue. That eliminates the possibility that matching is the cause of the balancing issues here.

$ diff zgraph.log zgraph_noMatch.log
1c1
< mesh bumDNS-Smaller.smb loaded in 0.671875 seconds
---
> mesh bumDNS-Smaller.smb loaded in 0.639380 seconds
4,6c4,6
< planned Zoltan split factor 16 to target imbalance 1.050000 in 0.847459 seconds
< mesh expanded from 1 to 16 parts in 0.623988 seconds
< mesh migrated from 1 to 16 in 6.783949 seconds
---
> planned Zoltan split factor 16 to target imbalance 1.050000 in 0.869006 seconds
> mesh expanded from 1 to 16 parts in 0.595982 seconds
> mesh migrated from 1 to 16 in 6.863939 seconds
23c23
< mesh zgraph_16/ written in 1.594624 seconds
---
> mesh zgraph_nomatch_16/ written in 1.732649 seconds
KennethEJansen commented 6 years ago

Would it make sense for us to ship this case to Karen Devine? My gut feeling is that, given that Sandia is the land of cubit and other hex dominant meshing tools, their tools would be expected to kick but on a hex dominant mesh,not come unbolted.

KennethEJansen commented 6 years ago

I have also asked Riccardo to try to push to higher part counts with Simmetrix before converting to mds. I think it will be interesting to see if Simmetrix partitioner (which is probably vanilla parMetis) manages to achieve a better balance that SCOREC/core.

I am also wondering if there is any chance that the choice of model is playing a role. The mesh was generated from a model created using simTranslate (e.g., it is a geomsim model derived from parasolid but this was done before meshing). I can't think of why this would cause problems but I am just throwing out everything I know in hopes that you who know more of the internals have all the available facts. I mean the model could not be simpler -- 5 faces aligned with cartesian x,y, and z and the 6th face has a small deformation/bumb but it is topologically a cube model (with one curved surface).

KennethEJansen commented 6 years ago

Riccardo reports that we have been able to balance all-hex meshes with single and double periodicity in the past. These were with meshes created using a parasolid model and never translated. Here are the numbers he observed there: First off, mesh stats number of tet 0 hex 331200 prism 0 pyramid 0 mesh entity counts: v 336000 e 1003200 f 998400 r 331200

Then, going from 1-32 with partitionMethod=graph and prePhastaBalanceMethod=parma-gap, max vertex load imbalance of partitioned mesh = 1.239238 ratio of sum of all vertices to sum of owned vertices = 1.184869 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.001063

Repeating with partitionMethod=rib and prePhastaBalanceMethod=parma-gap max vertex load imbalance of partitioned mesh = 1.158000 ratio of sum of all vertices to sum of owned vertices = 1.155938 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.000000 The numbers reported here don't seem to make much sense. With the same topology, how can there be no region imbalance but vertex imbalance? Maybe I am missing something here.

Repeating with partitionMethod=rib and prePhastaBalanceMethos=none max vertex load imbalance of partitioned mesh = 1.158000 ratio of sum of all vertices to sum of owned vertices = 1.155938 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.000000

Repeating with partitionMethod=zrib and prePhastaBalanceMethod=zrib max vertex load imbalance of partitioned mesh = 1.156476 ratio of sum of all vertices to sum of owned vertices = 1.155107 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.000000

not fantastic numbers but this is manageable.. much better than the 1.79 from the case with 0.05% wedges which itself is way better than 1500% (or factor of 15) imbalance when there were 19% wedges. This reminds me that for FUN3D you are trying to balance vertices instead of meshes. Can you remind how to make that switch? I wonder if perhaps directly targeting vertices my work better in these cases (of course we still want the elements to be uniquely partition and the vertices on the part boundaries replicated...just changing what we target as the primary balance).

KennethEJansen commented 6 years ago

Riccardo,

Cameron and I met and discussed this. He agreed that it would be good for you to try to push up the partitioning in phParAdapt or perhaps in a Simmetrix-based partitioning code (they have examples but I don’t know if it is worth the time to build them rather than just run phParAdapt in partition only mode) to as high of a part count as you can and compare to what you are seeing in chef regarding imbalances. I think on Theta it should be easy/quick to get up to around 16k since 128 nodes are usually a pretty quick turnaround time.

Please try the following jobs:

1) your first mesh with wedges outside of the BL, 2) your hex dominant mesh 3) your channel mesh 4) if possible, the mesh at this path kjansen@portal1: /projects/tools/Models/BoeingBump/CpFromRiccardo $ ls -alt total 256264 -rw-r--r-- 1 kjansen users 25580857 Aug 23 21:11 bumDNS-Smaller.sms

Obviously this 4th case is a smaller version of 1) and, if it illustrates the same pathology on cooley at proportional elements/per part (and thus much smaller partitions) this will be easier for the SCOREC/core team to debug the situation with.

Best regards,


Kenneth E. Jansen, Professor Ann and H.J. Smead Department of Aerospace Engineering Sciences ECAE 161 OFFICE (303) 492-4359 429 UCB FAX (303) 492-4990 University of Colorado at Boulder jansenke@colorado.edu Boulder, CO, 80309-0429 http://www.colorado.edu/aerospace


On Sep 8, 2018, at 6:06 AM, Cameron Smith notifications@github.com wrote:

Our Zoltan load balancer interface will create graph edges between matched faces:

https://github.com/SCOREC/core/blob/eccd2c3c87011c9af338dd2209291404f2b09df0/zoltan/apfZoltan.h#L100-L101

The Zoltan and PUMI RIB balancers don't transform coordinates to account for matching though.

Using the ptnParma tool you can balance with Zoltan RCB and experiment with other methods: https://github.com/SCOREC/core/blob/eccd2c3c87011c9af338dd2209291404f2b09df0/test/ptnParma.cc#L56-L68

Note, this tool runs parma after the splitter and may be very slow given the large imbalances involved:

mpirun -n 16 ./test/ptnParma gs_bumDNS.dmg bumDNS-Smaller.smb zrcb_16/ 16 rcb ptn 0

Applying the attached patch with git am disableZtnMatch.patch disables disableZtnMatch.patch.gz Zoltan matching support during graph construction.

The matched and unmatched 1->16 part Zoltan (calling parmetis) partitions are identical on the test mesh uploaded to this issue. That eliminates the possibility that matching is the cause of the balancing issues here.

$ diff zgraph.log zgraph_noMatch.log 1c1 < mesh bumDNS-Smaller.smb loaded in 0.671875 seconds

mesh bumDNS-Smaller.smb loaded in 0.639380 seconds 4,6c4,6 < planned Zoltan split factor 16 to target imbalance 1.050000 in 0.847459 seconds < mesh expanded from 1 to 16 parts in 0.623988 seconds < mesh migrated from 1 to 16 in 6.783949 seconds

planned Zoltan split factor 16 to target imbalance 1.050000 in 0.869006 seconds mesh expanded from 1 to 16 parts in 0.595982 seconds mesh migrated from 1 to 16 in 6.863939 seconds 23c23 < mesh zgraph_16/ written in 1.594624 seconds

mesh zgraph_nomatch_16/ written in 1.732649 seconds

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

rickybalin commented 6 years ago

### All hex channel flow mesh I am putting here my report on partitioning first an all hex, tensor-product mesh with 355,142 nodes and 345,870 elements. 1) With Simmetrix default partitioning options, the imbalance is worse than using partitionMethod=zrib and prePhastaBalanceMethod=parma-gap on Chef, but comparable to Chef when using partitionMethod=zrib and prePhastaBalanceMethod=graph. With parma-gap: a) Going from 1 to 2 parts, the vertex load imbalance is 1.033 for Simmetrix and 1.029 for Chef, and the region imbalance is basically 1 for both. b) Going to from 2 to 32 parts, Simmetrix has 1.191 vertex imbalance and 1.023 element imbalance, while Chef reports 1.141 and 1.00005, respectively. With graph: c) Going to from 2 to 32 parts, Chef reports 1.166 for vertex imbalance and 1.021 for element imbalance. These numbers are comparable to Simmetrix. I stopped at 32 parts with Simmetrix partitioning, but I can go to higher counts if needed and can also use different partitioners.

2) With Chef only I went from 32 to 512 parts. Here I tried using different partitioners to see which would perform better. a) partitionMethod=rib and prePhastaBalanceMethod=graph gives vertex imbalance of 1.66 and element imbalance of 1.030 b) partitionMethod=zrib and prePhastaBalanceMethod=graph gives vertex imbalance of 1.67 and element imbalance of 1.028 c) partitionMethod=zrib and prePhastaBalanceMethod=parma-gap gives vertex imbalance of 1.698 and element imbalance of 1.0007 The vertex imbalance is fairly similar for the three cases, but parma-gap provides significantly better element balance.

### Mixed Hex and Wedge Bump Mesh This is a mixed mesh of the bump flow that is smaller than the IDDES mehs that is currently running, but has the same structure as the original one that was made, i.e. a BL mesh of hexes extending from the bump surface and wedges (due to the triangles on the extruded surface) to fill the rest of the domain, with a smooth transition between the two regions. The mesh has 2,352,540 elements, of which 2,147,040 are hexes and 205,500 are wedges (about 10%). With Chef I carried out the following steps, which you can find on ALCF at /projects/TRAssembly_2/balin/NSF_DataDrivenTurbMod/TestChef/Bump/Chef on mira file system. 1) conversion to mds from sms mesh with ph_convert using a translated model in serial. The output gave some imbalance in the vertices: max vertex load imbalance of partitioned mesh = 1.033333 ratio of sum of all vertices to sum of owned vertices = 1.033333 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.000000 2) partitioned from 1 to 2 parts with partitionMethod=graph and prePhastaBalanceMethod=parma-gap. I also had LocalPtn=0 and elementsPerMigration=1000000, both of which a kept constant during these tests. Here is the output: max vertex load imbalance of partitioned mesh = 2.020841 ratio of sum of all vertices to sum of owned vertices = 1.036555 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.910114 3) tried again the above step but with partitionMethod=rib and prePhastaBalanceMethod=parma-gap giving better balance: max vertex load imbalance of partitioned mesh = 1.067758 ratio of sum of all vertices to sum of owned vertices = 1.037236 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.028280 4) Then using the output from 3) I partitioned to 8 again with rib and parma-gap. The imbalance grew to max vertex load imbalance of partitioned mesh = 1.188875 ratio of sum of all vertices to sum of owned vertices = 1.060580 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.118917 5) Then using the output from 4) with rib and parma-gap I went to 256 parts. This is a split factor of 32 which I admit might be ambitious, which is why I got the following pretty serious imbalance: max vertex load imbalance of partitioned mesh = 11.559141 ratio of sum of all vertices to sum of owned vertices = 1.264742 max region (3D) or face (2D) load imbalance of partitioned mesh = 9.988673 6) I tried a more modest split factor of 4 taking the output of 4) at 8 parts and going to 32, again with rib and parma-gap. Here the output was max vertex load imbalance of partitioned mesh = 2.783505 ratio of sum of all vertices to sum of owned vertices = 1.097407 max region (3D) or face (2D) load imbalance of partitioned mesh = 2.516579

Given the imbalance at 32 and 256 parts, this is where I stopped. Again please let me know if you want to see more cases or partition to larger counts.

I repeated those steps with phParAdapt, which uses the default Simmetrix partitioning tools. 1) Partitioned from 1 to 2, giving vertex and element imbalance as computed by Chef of 1.050 and 1.000003, respectively. 2) From 1) partitioned to 8 parts giving vertex and element imbalances of 1.143 and 1.0369. 3) From 2) partitioned to 256 parts giving vertex and element imbalances of 1.324 and 1.0325. 4) From 3) partitioned to 4ki parts giving vertex and element imbalances of 1.732 and 1.050.

KennethEJansen commented 6 years ago

I am most interested in the hex dominant case at 32k since that seems to be where chef falls apart. I do care about the tensor product but just not as much.

Sent from my iPhone

On Sep 19, 2018, at 3:57 PM, rickybalin notifications@github.com<mailto:notifications@github.com> wrote:

All hex channel flow mesh

I am putting here my report on partitioning first an all hex, tensor-product mesh with 355,142 nodes and 345,870 elements.

  1. With Simmetrix default partitioning options, the imbalance is worse than using partitionMethod=zrib and prePhastaBalanceMethod=parma-gap on Chef, but comparable to Chef when using partitionMethod=zrib and prePhastaBalanceMethod=graph. With parma-gap: a) Going from 1 to 2 parts, the vertex load imbalance is 1.033 for Simmetrix and 1.029 for Chef, and the region imbalance is basically 1 for both. b) Going to from 2 to 32 parts, Simmetrix has 1.191 vertex imbalance and 1.023 element imbalance, while Chef reports 1.141 and 1.00005, respectively. With graph: c) Going to from 2 to 32 parts, Chef reports 1.166 for vertex imbalance and 1.021 for element imbalance. These numbers are comparable to Simmetrix. I stopped at 32 parts with Simmetrix partitioning, but I can go to higher counts if needed and can also use different partitioners.

  2. With Chef only I went from 32 to 512 parts. Here I tried using different partitioners to see which would perform better. a) partitionMethod=rib and prePhastaBalanceMethod=graph gives vertex imbalance of 1.66 and element imbalance of 1.030 b) partitionMethod=zrib and prePhastaBalanceMethod=graph gives vertex imbalance of 1.67 and element imbalance of 1.028 c) partitionMethod=zrib and prePhastaBalanceMethod=parma-gap gives vertex imbalance of 1.698 and element imbalance of 1.0007 The vertex imbalance is fairly similar for the three cases, but parma-gap provides significantly better element balance.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/SCOREC/core/issues/173#issuecomment-422972575, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALalhoy0YH-055odWX4X3hwe-iEAXuVOks5ucr3jgaJpZM4WLVHf.

rickybalin commented 6 years ago

I added more information to my previous post which includes Chef and Simmetrix results on a mixed hex and wedge bump mesh.

cwsmith commented 6 years ago

With a mixed mesh ParMA will not attempt vertex balancing:

https://github.com/SCOREC/core/blob/c341ecd5bf908369d21cca095b067ce6ef4f4e19/phasta/phPartition.cc#L140-L145

In all the cases the elements are balanced below the requested tolerance by the base partitioner and thus ParMA does nothing (balanced in 0 steps).

2-1:

PARMA_STATUS preRefine entity imbalance <v e f r>: 1.95 1.94 1.92 1.91
PARMA_STATUS elements balanced in 0 steps to 1.030000 in 43.630519 seconds
max vertex load imbalance of partitioned mesh = 2.020841
ratio of sum of all vertices to sum of owned vertices = 1.036555
max region (3D) or face (2D) load imbalance of partitioned mesh = 1.910114

8-2:

PARMA_STATUS preRefine entity imbalance <v e f r>: 1.12 1.12 1.12 1.12
PARMA_STATUS elements balanced in 0 steps to 1.030000 in 3.759609 seconds
max vertex load imbalance of partitioned mesh = 1.188875
ratio of sum of all vertices to sum of owned vertices = 1.060580
max region (3D) or face (2D) load imbalance of partitioned mesh = 1.118917

32-8:

PARMA_STATUS preRefine entity imbalance <v e f r>: 2.54 2.53 2.52 2.52
PARMA_STATUS elements balanced in 0 steps to 1.030000 in 2.677819 seconds
max vertex load imbalance of partitioned mesh = 2.783505
ratio of sum of all vertices to sum of owned vertices = 1.097407
max region (3D) or face (2D) load imbalance of partitioned mesh = 2.516579

256-8:

PARMA_STATUS preRefine entity imbalance <v e f r>: 9.14 9.41 9.69 9.99
PARMA_STATUS elements balanced in 0 steps to 1.030000 in 8.244735 seconds
max vertex load imbalance of partitioned mesh = 11.559141
ratio of sum of all vertices to sum of owned vertices = 1.264742
max region (3D) or face (2D) load imbalance of partitioned mesh = 9.988673
cwsmith commented 6 years ago

I ran 'ptnParma' on the 32 part mesh to dig a bit more. The output is below.

The first thing to notice is that PARMA_STATUS initial entity imbalance <v e f r>: 2.54 2.53 2.52 2.52 matches the preRefine stats from @rickybalin's run.

Next, running the parmetis graph balancer in global mode (1 pmetis ptn 0 to ptnParma) produces a terrible partition with PARMA_STATUS afterSplit entity imbalance <v e f r>: 26.96 26.68 26.41 26.14. Looking at the mesh in paraview makes the cause obvious - all the hexs are in the same part!

mixed Mesh with edges shown. The nearly solid blue region is the hex portion of the mesh. The rest of the elements are wedges.

mixed_part13 mixed_part13_zoom Mesh with part 13 shaded gray. It contains nearly all the hexes.

cwsmith@pachisi: /lore/cwsmith/killme/rickyMixed $ mpirun -np 32 ./test/ptnParma '.null' mdsMesh/ foo 1 pmetis ptn 0         
PUMI version 2.1.0 Git hash 64d3fb53b8d57e22274a11fa8144f3f74c0a8f3e
INPUTS model .null mesh mdsMesh/ out foo factor 1 method pmetis approach ptn isLocal 0
model .null loaded in 0.000005 seconds
mesh mdsMesh/ loaded in 5.874271 seconds
number of tet 0 hex 2147040 prism 205500 pyramid 0
mesh entity counts: v 2262690 e 6877890 f 6967740 r 2352540
PARMA_STATUS initial disconnected <max avg> 1 0.062
PARMA_STATUS initial neighbors <max avg> 8 4.750
PARMA_STATUS initial smallest side of max neighbor part 7
PARMA_STATUS initial num parts with max neighbors 1
PARMA_STATUS initial empty parts 0
PARMA_STATUS initial small neighbor counts 1:0 2:0 3:0 4:0 5:0 6:0 7:2 8:0 9:0 10:0 
PARMA_STATUS initial weighted vtx <tot max min avg> 2483092.0 196819.0 4199.0 77596.625
PARMA_STATUS initial weighted edge <tot max min avg> 7316350.0 578437.0 15065.0 228635.938
PARMA_STATUS initial weighted face <tot max min avg> 7185830.0 566630.0 17480.0 224557.188
PARMA_STATUS initial weighted rgn <tot max min avg> 2352540.0 185011.0 6613.0 73516.875
PARMA_STATUS initial owned bdry vtx <tot max min avg> 143213 16464 0 4475.406
PARMA_STATUS initial shared bdry vtx <tot max min avg> 288192 19159 1040 9006.000
PARMA_STATUS initial model bdry vtx <tot max min avg> 185160 17837 276 5786.250
PARMA_STATUS initial sharedSidesToElements <max min avg> 0.352 0.058 0.187
PARMA_STATUS initial entity imbalance <v e f r>: 2.54 2.53 2.52 2.52
planned Zoltan split factor 1 to target imbalance 1.050000 in 60.838516 seconds
mesh expanded from 32 to 32 parts in 2.359758 seconds
mesh migrated from 32 to 32 in 97.855141 seconds
writeVtuFile into buffers: 0.031596 seconds
writeVtuFile buffers to disk: 0.164667 seconds
vtk files foo written in 0.329910 seconds
PARMA_STATUS afterSplit disconnected <max avg> 1 0.125
PARMA_STATUS afterSplit neighbors <max avg> 12 5.688
PARMA_STATUS afterSplit smallest side of max neighbor part 212
PARMA_STATUS afterSplit num parts with max neighbors 1
PARMA_STATUS afterSplit empty parts 0
PARMA_STATUS afterSplit small neighbor counts 1:0 2:0 3:0 4:2 5:0 6:0 7:0 8:0 9:0 10:0 
PARMA_STATUS afterSplit weighted vtx <tot max min avg> 2379189.0 2004360.0 3058.0 74349.656
PARMA_STATUS afterSplit weighted edge <tot max min avg> 7112952.0 5930103.0 10777.0 222279.750
PARMA_STATUS afterSplit weighted face <tot max min avg> 7086339.0 5847699.0 12322.0 221448.094
PARMA_STATUS afterSplit weighted rgn <tot max min avg> 2352540.0 1921955.0 4602.0 73516.875
PARMA_STATUS afterSplit owned bdry vtx <tot max min avg> 38981 6456 0 1218.156
PARMA_STATUS afterSplit shared bdry vtx <tot max min avg> 80057 15153 916 2501.781
PARMA_STATUS afterSplit model bdry vtx <tot max min avg> 178513 150764 171 5578.531
PARMA_STATUS afterSplit sharedSidesToElements <max min avg> 0.346 0.008 0.204
PARMA_STATUS afterSplit entity imbalance <v e f r>: 26.96 26.68 26.41 26.14
KennethEJansen commented 5 years ago

Is there a time we can talk about a work-around for this issue?

Riccardo has parted the mesh with Simmetrix to much higher part counts but that tool has somewhat broken/out-of-date prep of phasta files so we have to decide if we put effort toward updating that output creation


Kenneth E. Jansen, Professor Ann and H.J. Smead Department of Aerospace Engineering Sciences ECAE 161 OFFICE (303) 492-4359 429 UCB FAX (303) 492-4990 University of Colorado at Boulder jansenke@colorado.edu Boulder, CO, 80309-0429 http://www.colorado.edu/aerospace


On Sep 24, 2018, at 2:11 PM, Cameron Smith notifications@github.com wrote:

I ran 'ptnParma' on the 32 part mesh to dig a bit more. The output is below.

The first thing to notice is that PARMA_STATUS initial entity imbalance : 2.54 2.53 2.52 2.52 matches the preRefine stats from @rickybalin's run.

Next, running the parmetis graph balancer in global mode (1 pmetis ptn 0 to ptnParma) produces a terrible partition with PARMA_STATUS afterSplit entity imbalance : 26.96 26.68 26.41 26.14. Looking at the mesh in paraview makes the cause obvious - all the hexs are in the same part!

Mesh with edges shown. The nearly solid blue region is the hex portion of the mesh. The rest of the elements are wedges.

Mesh with part 13 shaded gray. It contains nearly all the hexes.

cwsmith@pachisi: /lore/cwsmith/killme/rickyMixed $ mpirun -np 32 ./test/ptnParma '.null' mdsMesh/ foo 1 pmetis ptn 0
PUMI version 2.1.0 Git hash 64d3fb53b8d57e22274a11fa8144f3f74c0a8f3e INPUTS model .null mesh mdsMesh/ out foo factor 1 method pmetis approach ptn isLocal 0 model .null loaded in 0.000005 seconds mesh mdsMesh/ loaded in 5.874271 seconds number of tet 0 hex 2147040 prism 205500 pyramid 0 mesh entity counts: v 2262690 e 6877890 f 6967740 r 2352540 PARMA_STATUS initial disconnected 1 0.062 PARMA_STATUS initial neighbors 8 4.750 PARMA_STATUS initial smallest side of max neighbor part 7 PARMA_STATUS initial num parts with max neighbors 1 PARMA_STATUS initial empty parts 0 PARMA_STATUS initial small neighbor counts 1:0 2:0 3:0 4:0 5:0 6:0 7:2 8:0 9:0 10:0 PARMA_STATUS initial weighted vtx 2483092.0 196819.0 4199.0 77596.625 PARMA_STATUS initial weighted edge 7316350.0 578437.0 15065.0 228635.938 PARMA_STATUS initial weighted face 7185830.0 566630.0 17480.0 224557.188 PARMA_STATUS initial weighted rgn 2352540.0 185011.0 6613.0 73516.875 PARMA_STATUS initial owned bdry vtx 143213 16464 0 4475.406 PARMA_STATUS initial shared bdry vtx 288192 19159 1040 9006.000 PARMA_STATUS initial model bdry vtx 185160 17837 276 5786.250 PARMA_STATUS initial sharedSidesToElements 0.352 0.058 0.187 PARMA_STATUS initial entity imbalance : 2.54 2.53 2.52 2.52 planned Zoltan split factor 1 to target imbalance 1.050000 in 60.838516 seconds mesh expanded from 32 to 32 parts in 2.359758 seconds mesh migrated from 32 to 32 in 97.855141 seconds writeVtuFile into buffers: 0.031596 seconds writeVtuFile buffers to disk: 0.164667 seconds vtk files foo written in 0.329910 seconds PARMA_STATUS afterSplit disconnected 1 0.125 PARMA_STATUS afterSplit neighbors 12 5.688 PARMA_STATUS afterSplit smallest side of max neighbor part 212 PARMA_STATUS afterSplit num parts with max neighbors 1 PARMA_STATUS afterSplit empty parts 0 PARMA_STATUS afterSplit small neighbor counts 1:0 2:0 3:0 4:2 5:0 6:0 7:0 8:0 9:0 10:0 PARMA_STATUS afterSplit weighted vtx 2379189.0 2004360.0 3058.0 74349.656 PARMA_STATUS afterSplit weighted edge 7112952.0 5930103.0 10777.0 222279.750 PARMA_STATUS afterSplit weighted face 7086339.0 5847699.0 12322.0 221448.094 PARMA_STATUS afterSplit weighted rgn 2352540.0 1921955.0 4602.0 73516.875 PARMA_STATUS afterSplit owned bdry vtx 38981 6456 0 1218.156 PARMA_STATUS afterSplit shared bdry vtx 80057 15153 916 2501.781 PARMA_STATUS afterSplit model bdry vtx 178513 150764 171 5578.531 PARMA_STATUS afterSplit sharedSidesToElements 0.346 0.008 0.204 PARMA_STATUS afterSplit entity imbalance : 26.96 26.68 26.41 26.14

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

cwsmith commented 5 years ago

The problem appears to be the result of weights being applied but stats reported without weights. We'll have to decide if we want non-uniform weights applied to entities for phasta's mixed meshes.

The following call will set entity weights based on their memory consumption for use by the partitioner (increases part count):

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L40

But the partition statistics printing function that is called after the partitioner and before parma:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L103

does not account for the weights.

With weights disabled core's interface to Zoltan (which calls ParMETIS) produces partitions with nearly the same quality as the SimModSuite interface to ParMETIS on a 200k mixed element mesh of a cylinder (prismatic BLs grown on the cylinder walls). Below are the logs from the core zoltan splitter (zsplit) and the SimModSuite ParMETIS partitioning example (exPartition). Attached is the exPartition source code, the modifications to core's zsplit.cc to use unit weights, and the input mesh (core and SimModSuite) and model. zsplit.patch.gz exPartition.cc.gz pipe.tar.gz

$ mpirun -np 16 $d/zsplit
pipe.smd pipeMixed200k.smb zunit16/ 16 
mesh pipeMixed200k.smb loaded in 0.713488 seconds
number of tet 97969 hex 0 prism 99680 pyramid 0
mesh entity counts: v 70685 e 324324 f 451289 r 197649
planned Zoltan split factor 16 to target imbalance 1.020000 in 1.625770 seconds
mesh expanded from 1 to 16 parts in 0.697993 seconds
mesh migrated from 1 to 16 in 7.360952 seconds
PARMA_STATUS  disconnected <max avg> 0 0.000
PARMA_STATUS  neighbors <max avg> 8 5.500
PARMA_STATUS  smallest side of max neighbor part 35
PARMA_STATUS  num parts with max neighbors 1
PARMA_STATUS  empty parts 0
PARMA_STATUS  small neighbor counts 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 
PARMA_STATUS  weighted vtx <tot max min avg> 79161.0 5262.0 4650.0 4947.562
PARMA_STATUS  weighted edge <tot max min avg> 343506.0 22405.0 20659.0 21469.125
PARMA_STATUS  weighted face <tot max min avg> 462010.0 29727.0 28310.0 28875.625
PARMA_STATUS  weighted rgn <tot max min avg> 197649.0 12583.0 12250.0 12353.062
PARMA_STATUS  owned bdry vtx <tot max min avg> 7910 1163 0 494.375
PARMA_STATUS  shared bdry vtx <tot max min avg> 16386 1311 750 1024.125
PARMA_STATUS  model bdry vtx <tot max min avg> 7455 737 288 465.938
PARMA_STATUS  sharedSidesToElements <max min avg> 0.141 0.080 0.108
PARMA_STATUS  entity imbalance <v e f r>: 1.06 1.04 1.03 1.02
MDS: reordering before writing smb files
mesh zunit16/ written in 1.513614 seconds

$ mpirun -np 32 $d/zsplit
pipe.smd pipeMixed200k.smb zunit32/ 32                                                                                                                                                                                     
mesh pipeMixed200k.smb loaded in 1.444196 seconds
number of tet 97969 hex 0 prism 99680 pyramid 0
mesh entity counts: v 70685 e 324324 f 451289 r 197649
planned Zoltan split factor 32 to target imbalance 1.020000 in 3.305292 seconds
mesh expanded from 1 to 32 parts in 1.411998 seconds
mesh migrated from 1 to 32 in 16.497007 seconds
PARMA_STATUS  disconnected <max avg> 0 0.000
PARMA_STATUS  neighbors <max avg> 15 7.938
PARMA_STATUS  smallest side of max neighbor part 7
PARMA_STATUS  num parts with max neighbors 1
PARMA_STATUS  empty parts 0
PARMA_STATUS  small neighbor counts 1:2 2:0 3:0 4:0 5:0 6:0 7:6 8:0 9:4 10:4 
PARMA_STATUS  weighted vtx <tot max min avg> 83017.0 2888.0 1995.0 2594.281
PARMA_STATUS  weighted edge <tot max min avg> 351894.0 11780.0 9630.0 10996.688
PARMA_STATUS  weighted face <tot max min avg> 466558.0 15160.0 13850.0 14579.938
PARMA_STATUS  weighted rgn <tot max min avg> 197649.0 6300.0 6055.0 6176.531
PARMA_STATUS  owned bdry vtx <tot max min avg> 11073 823 0 346.031
PARMA_STATUS  shared bdry vtx <tot max min avg> 23405 881 540 731.406
PARMA_STATUS  model bdry vtx <tot max min avg> 7732 461 98 241.625
PARMA_STATUS  sharedSidesToElements <max min avg> 0.211 0.106 0.155
PARMA_STATUS  entity imbalance <v e f r>: 1.11 1.07 1.04 1.02
MDS: reordering before writing smb files
mesh zunit32/ written in 3.013404 seconds

$ mpirun -np 8 ./exPartition pipeMixed200k.sms 16
Info: 10420 edges cut
tot <v e f r>  78820 342773 461618 197649
avg <v e f r> 4926.25 21423.31 28851.12 12353.06
max <v e f r>   5297  22568  29946  12696
imb <v e f r> 1.08 1.05 1.04 1.03

$ mpirun -np 8 ./exPartition pipeMixed200k.sms 32
Info: 15369 edges cut
tot <v e f r>  82993 351889 466577 197649
avg <v e f r> 2593.53 10996.53 14580.53 6176.53
max <v e f r>   2968  11884  15434   6533
imb <v e f r> 1.14 1.08 1.06 1.06
rickybalin commented 5 years ago

Hi Cameron,

Thanks a lot for the update.

I applied the changes that I saw in zsplit.patch.gz to the current master version of core, and I am about to try chef on our bump geometry with a mixed mesh of around 2.5 million elements. The next step will be to test it on the 150 million element mixed mesh that gave us problems in the first place. Just to make sure I use this modification in the right way, which option should I choose for the partitionMethod and the prePhastaBalanceMethod? Are there any other settings I should be aware of in adapt.inp?

Thanks a lot in advance.

Riccardo


Riccardo Balin Doctoral Candidate Smead Aerospace Engineering Sciences University of Colorado, Boulder Riccardo.Balin@Colorado.EDU

On Thu, Dec 6, 2018 at 1:36 PM Cameron Smith notifications@github.com wrote:

The problem appears to be the result of weights being applied but stats reported without weights.

The following calls will set entity weights based on their memory consumption:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L104

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L40

But the partition statistic printing function called:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L103

does not account for the weights.

With weights disabled core's interface to Zoltan (which calls ParMETIS) produces partitions with nearly the same quality as the SimModSuite interface to ParMETIS on a 200k mixed element mesh of a cylinder (prismatic BLs grown on the cylinder walls). Below are the logs from the core zoltan splitter (zsplit) and the SimModSuite ParMETIS partitioning example ( exPartition). Attached is the exPartition source code, the modifications to core's zsplit.cc to use unit weights, and the input mesh (core and SimModSuite) and model. zsplit.patch.gz https://github.com/SCOREC/core/files/2654742/zsplit.patch.gz exPartition.cc.gz https://github.com/SCOREC/core/files/2654743/exPartition.cc.gz pipe.tar.gz https://github.com/SCOREC/core/files/2654748/pipe.tar.gz

$ mpirun -np 16 $d/zsplit pipe.smd pipeMixed200k.smb zunit16/ 16 mesh pipeMixed200k.smb loaded in 0.713488 seconds number of tet 97969 hex 0 prism 99680 pyramid 0 mesh entity counts: v 70685 e 324324 f 451289 r 197649 planned Zoltan split factor 16 to target imbalance 1.020000 in 1.625770 seconds mesh expanded from 1 to 16 parts in 0.697993 seconds mesh migrated from 1 to 16 in 7.360952 seconds PARMA_STATUS disconnected 0 0.000 PARMA_STATUS neighbors 8 5.500 PARMA_STATUS smallest side of max neighbor part 35 PARMA_STATUS num parts with max neighbors 1 PARMA_STATUS empty parts 0 PARMA_STATUS small neighbor counts 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 PARMA_STATUS weighted vtx 79161.0 5262.0 4650.0 4947.562 PARMA_STATUS weighted edge 343506.0 22405.0 20659.0 21469.125 PARMA_STATUS weighted face 462010.0 29727.0 28310.0 28875.625 PARMA_STATUS weighted rgn 197649.0 12583.0 12250.0 12353.062 PARMA_STATUS owned bdry vtx 7910 1163 0 494.375 PARMA_STATUS shared bdry vtx 16386 1311 750 1024.125 PARMA_STATUS model bdry vtx 7455 737 288 465.938 PARMA_STATUS sharedSidesToElements 0.141 0.080 0.108 PARMA_STATUS entity imbalance : 1.06 1.04 1.03 1.02 MDS: reordering before writing smb files mesh zunit16/ written in 1.513614 seconds

$ mpirun -np 32 $d/zsplit pipe.smd pipeMixed200k.smb zunit32/ 32 mesh pipeMixed200k.smb loaded in 1.444196 seconds number of tet 97969 hex 0 prism 99680 pyramid 0 mesh entity counts: v 70685 e 324324 f 451289 r 197649 planned Zoltan split factor 32 to target imbalance 1.020000 in 3.305292 seconds mesh expanded from 1 to 32 parts in 1.411998 seconds mesh migrated from 1 to 32 in 16.497007 seconds PARMA_STATUS disconnected 0 0.000 PARMA_STATUS neighbors 15 7.938 PARMA_STATUS smallest side of max neighbor part 7 PARMA_STATUS num parts with max neighbors 1 PARMA_STATUS empty parts 0 PARMA_STATUS small neighbor counts 1:2 2:0 3:0 4:0 5:0 6:0 7:6 8:0 9:4 10:4 PARMA_STATUS weighted vtx 83017.0 2888.0 1995.0 2594.281 PARMA_STATUS weighted edge 351894.0 11780.0 9630.0 10996.688 PARMA_STATUS weighted face 466558.0 15160.0 13850.0 14579.938 PARMA_STATUS weighted rgn 197649.0 6300.0 6055.0 6176.531 PARMA_STATUS owned bdry vtx 11073 823 0 346.031 PARMA_STATUS shared bdry vtx 23405 881 540 731.406 PARMA_STATUS model bdry vtx 7732 461 98 241.625 PARMA_STATUS sharedSidesToElements 0.211 0.106 0.155 PARMA_STATUS entity imbalance : 1.11 1.07 1.04 1.02 MDS: reordering before writing smb files mesh zunit32/ written in 3.013404 seconds

$ mpirun -np 8 ./exPartition pipeMixed200k.sms 16 Info: 10420 edges cut tot 78820 342773 461618 197649 avg 4926.25 21423.31 28851.12 12353.06 max 5297 22568 29946 12696 imb 1.08 1.05 1.04 1.03

$ mpirun -np 8 ./exPartition pipeMixed200k.sms 32 Info: 15369 edges cut tot 82993 351889 466577 197649 avg 2593.53 10996.53 14580.53 6176.53 max 2968 11884 15434 6533 imb 1.14 1.08 1.06 1.06

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCOREC/core/issues/173#issuecomment-445020607, or mute the thread https://github.com/notifications/unsubscribe-auth/AKCQcDuLm8nNMHKc4lR3ZkOQFhn_7Rxiks5u2X-lgaJpZM4WLVHf .

rickybalin commented 5 years ago

Just an update:

I used the zsplit executable on the 2.5 million elements and saw improved balance: PARMA_STATUS entity imbalance : 1.08 1.06 1.04 1.03 relative to the previous attempt with chef: PARMA_STATUS preRefine entity imbalance : 1.12 1.12 1.12 1.12

I then tried it on the 150 million element mesh on Cooley, for a 1-2 partitioning step, and I ran into an INT_MAX issue: mesh ../../MixedMesh/sim2mds1/mdsMesh/ loaded in 278.080879 seconds number of tet 0 hex 127370880 prism 24669840 pyramid 0 mesh entity counts: v 140078952 e 432198312 f 444160080 r 152040720 planned Zoltan split factor 2 to target imbalance 1.020000 in 652.917634 seconds mesh expanded from 1 to 2 parts in 195.450973 seconds ERROR PCU message size exceeds INT_MAX... exiting [cc042:mpi_rank_0][error_sighandler] Caught error: Aborted (signal 6)

I build the executable with: -DMDS_SET_MAX=2048 \ -DMDS_ID_TYPE=long long \

But the INT_MAX issue was not seen when I partitioned from 1 to 4, and the parts had a good balance again relative to previous attempts with chef.

To generate the PHASTA input files, I followed the zsplit step with chef, but with "none" for all partitioning and split factor of 1. This worked (no errors reported) in parallel, even with a matched mesh with periodic boundary conditions. To be clear: I performed ph_convert in serial, then zsplit to partition from 1 to 4 parts, then chef to generate the input files for PHASTA on 4 processes.

I will keep partitioning to see if this combination (zsplit+chef) works at larger parts counts as well and also to see how the partition balance evolves.

Thanks again.


Riccardo Balin Doctoral Candidate Smead Aerospace Engineering Sciences University of Colorado, Boulder Riccardo.Balin@Colorado.EDU

On Fri, Dec 21, 2018 at 10:35 AM Riccardo Balin Riccardo.Balin@colorado.edu wrote:

Hi Cameron,

Thanks a lot for the update.

I applied the changes that I saw in zsplit.patch.gz to the current master version of core, and I am about to try chef on our bump geometry with a mixed mesh of around 2.5 million elements. The next step will be to test it on the 150 million element mixed mesh that gave us problems in the first place. Just to make sure I use this modification in the right way, which option should I choose for the partitionMethod and the prePhastaBalanceMethod? Are there any other settings I should be aware of in adapt.inp?

Thanks a lot in advance.

Riccardo


Riccardo Balin Doctoral Candidate Smead Aerospace Engineering Sciences University of Colorado, Boulder Riccardo.Balin@Colorado.EDU

On Thu, Dec 6, 2018 at 1:36 PM Cameron Smith notifications@github.com wrote:

The problem appears to be the result of weights being applied but stats reported without weights.

The following calls will set entity weights based on their memory consumption:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L104

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L40

But the partition statistic printing function called:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L103

does not account for the weights.

With weights disabled core's interface to Zoltan (which calls ParMETIS) produces partitions with nearly the same quality as the SimModSuite interface to ParMETIS on a 200k mixed element mesh of a cylinder (prismatic BLs grown on the cylinder walls). Below are the logs from the core zoltan splitter (zsplit) and the SimModSuite ParMETIS partitioning example ( exPartition). Attached is the exPartition source code, the modifications to core's zsplit.cc to use unit weights, and the input mesh (core and SimModSuite) and model. zsplit.patch.gz https://github.com/SCOREC/core/files/2654742/zsplit.patch.gz exPartition.cc.gz https://github.com/SCOREC/core/files/2654743/exPartition.cc.gz pipe.tar.gz https://github.com/SCOREC/core/files/2654748/pipe.tar.gz

$ mpirun -np 16 $d/zsplit pipe.smd pipeMixed200k.smb zunit16/ 16 mesh pipeMixed200k.smb loaded in 0.713488 seconds number of tet 97969 hex 0 prism 99680 pyramid 0 mesh entity counts: v 70685 e 324324 f 451289 r 197649 planned Zoltan split factor 16 to target imbalance 1.020000 in 1.625770 seconds mesh expanded from 1 to 16 parts in 0.697993 seconds mesh migrated from 1 to 16 in 7.360952 seconds PARMA_STATUS disconnected 0 0.000 PARMA_STATUS neighbors 8 5.500 PARMA_STATUS smallest side of max neighbor part 35 PARMA_STATUS num parts with max neighbors 1 PARMA_STATUS empty parts 0 PARMA_STATUS small neighbor counts 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 PARMA_STATUS weighted vtx 79161.0 5262.0 4650.0 4947.562 PARMA_STATUS weighted edge 343506.0 22405.0 20659.0 21469.125 PARMA_STATUS weighted face 462010.0 29727.0 28310.0 28875.625 PARMA_STATUS weighted rgn 197649.0 12583.0 12250.0 12353.062 PARMA_STATUS owned bdry vtx 7910 1163 0 494.375 PARMA_STATUS shared bdry vtx 16386 1311 750 1024.125 PARMA_STATUS model bdry vtx 7455 737 288 465.938 PARMA_STATUS sharedSidesToElements 0.141 0.080 0.108 PARMA_STATUS entity imbalance : 1.06 1.04 1.03 1.02 MDS: reordering before writing smb files mesh zunit16/ written in 1.513614 seconds

$ mpirun -np 32 $d/zsplit pipe.smd pipeMixed200k.smb zunit32/ 32 mesh pipeMixed200k.smb loaded in 1.444196 seconds number of tet 97969 hex 0 prism 99680 pyramid 0 mesh entity counts: v 70685 e 324324 f 451289 r 197649 planned Zoltan split factor 32 to target imbalance 1.020000 in 3.305292 seconds mesh expanded from 1 to 32 parts in 1.411998 seconds mesh migrated from 1 to 32 in 16.497007 seconds PARMA_STATUS disconnected 0 0.000 PARMA_STATUS neighbors 15 7.938 PARMA_STATUS smallest side of max neighbor part 7 PARMA_STATUS num parts with max neighbors 1 PARMA_STATUS empty parts 0 PARMA_STATUS small neighbor counts 1:2 2:0 3:0 4:0 5:0 6:0 7:6 8:0 9:4 10:4 PARMA_STATUS weighted vtx 83017.0 2888.0 1995.0 2594.281 PARMA_STATUS weighted edge 351894.0 11780.0 9630.0 10996.688 PARMA_STATUS weighted face 466558.0 15160.0 13850.0 14579.938 PARMA_STATUS weighted rgn 197649.0 6300.0 6055.0 6176.531 PARMA_STATUS owned bdry vtx 11073 823 0 346.031 PARMA_STATUS shared bdry vtx 23405 881 540 731.406 PARMA_STATUS model bdry vtx 7732 461 98 241.625 PARMA_STATUS sharedSidesToElements 0.211 0.106 0.155 PARMA_STATUS entity imbalance : 1.11 1.07 1.04 1.02 MDS: reordering before writing smb files mesh zunit32/ written in 3.013404 seconds

$ mpirun -np 8 ./exPartition pipeMixed200k.sms 16 Info: 10420 edges cut tot 78820 342773 461618 197649 avg 4926.25 21423.31 28851.12 12353.06 max 5297 22568 29946 12696 imb 1.08 1.05 1.04 1.03

$ mpirun -np 8 ./exPartition pipeMixed200k.sms 32 Info: 15369 edges cut tot 82993 351889 466577 197649 avg 2593.53 10996.53 14580.53 6176.53 max 2968 11884 15434 6533 imb 1.14 1.08 1.06 1.06

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCOREC/core/issues/173#issuecomment-445020607, or mute the thread https://github.com/notifications/unsubscribe-auth/AKCQcDuLm8nNMHKc4lR3ZkOQFhn_7Rxiks5u2X-lgaJpZM4WLVHf .

cwsmith commented 5 years ago

I'd suggest modifying chef to use unit element weights. If you replace the call to Parma_WeighByMemory here:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L40

with setWeights (defined here: https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L78-L83

you should be good to go.

If you want to run parma on the mixed mesh after partitioning with zoltan (chef option graph, IIRC - double check this) then you'd want to also replace the following call to Parma_WeighByMemory (as done above).

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L104

rickybalin commented 5 years ago

This is another update.

I added Cameron's suggestions to chef, and partitioned the 150M element mixed mesh up to 32k parts. Note that I used fairly small split factors, a maximum of 8, bus this was done mainly for memory reasons. The statistics at this part count are as follows:

PARMA_STATUS preRefine entity imbalance : 1.31 1.22 1.11 1.00 PARMA_STATUS elements balanced in 0 steps to 1.030000 in 3.849516 seconds mesh reordered in 0.706076 seconds max vertex load imbalance of partitioned mesh = 1.695958 ratio of sum of all vertices to sum of owned vertices = 1.292254 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.000018

which is a significant improvement relative to what I was able to achieve previously with chef, which was:


Riccardo Balin Doctoral Candidate Smead Aerospace Engineering Sciences University of Colorado, Boulder Riccardo.Balin@Colorado.EDU

On Fri, Dec 21, 2018 at 10:50 PM Cameron Smith notifications@github.com wrote:

I'd suggest modifying chef to use unit element weights. If you replace the call to Parma_WeighByMemory here:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L40

with setWeights (defined here:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L78-L83

you should be good to go.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCOREC/core/issues/173#issuecomment-449506882, or mute the thread https://github.com/notifications/unsubscribe-auth/AKCQcDJL425fMCb7kSZY_yWSrzZCFALqks5u7VeJgaJpZM4WLVHf .

rickybalin commented 5 years ago

Sorry the previous message was sent prematurely, here is the entire one.

This is another update.

I added Cameron's suggestions to chef, and partitioned the 150M element mixed mesh up to 32k parts. Note that I used fairly small split factors, a maximum of 8, bus this was done mainly for memory reasons. The statistics at this part count are as follows:

PARMA_STATUS preRefine entity imbalance : 1.31 1.22 1.11 1.00 PARMA_STATUS elements balanced in 0 steps to 1.030000 in 3.849516 seconds mesh reordered in 0.706076 seconds max vertex load imbalance of partitioned mesh = 1.695958 ratio of sum of all vertices to sum of owned vertices = 1.292254 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.000018

which is a significant improvement relative to what I was able to achieve previously with chef, which was:

PARMA_STATUS preRefine entity imbalance : 60.45 59.44 58.37 57.67 PARMA_STATUS elements balanced in 0 steps to 1.030000 in 36.427068 seconds mesh reordered in 27.338621 seconds max vertex load imbalance of partitioned mesh = 15.286312 ratio of sum of all vertices to sum of owned vertices = 1.142698 max region (3D) or face (2D) load imbalance of partitioned mesh = 14.059526

This new partition for the mixed mesh I believe is more than adequate for PHASTA based on previous cases, although I have not run it yet.

Thanks again for helping us with this issue.

Regards,

Riccardo


Riccardo Balin Doctoral Candidate Smead Aerospace Engineering Sciences University of Colorado, Boulder Riccardo.Balin@Colorado.EDU

On Thu, Dec 27, 2018 at 12:04 PM Riccardo Balin Riccardo.Balin@colorado.edu wrote:

This is another update.

I added Cameron's suggestions to chef, and partitioned the 150M element mixed mesh up to 32k parts. Note that I used fairly small split factors, a maximum of 8, bus this was done mainly for memory reasons. The statistics at this part count are as follows:

PARMA_STATUS preRefine entity imbalance : 1.31 1.22 1.11 1.00 PARMA_STATUS elements balanced in 0 steps to 1.030000 in 3.849516 seconds mesh reordered in 0.706076 seconds max vertex load imbalance of partitioned mesh = 1.695958 ratio of sum of all vertices to sum of owned vertices = 1.292254 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.000018

which is a significant improvement relative to what I was able to achieve previously with chef, which was:


Riccardo Balin Doctoral Candidate Smead Aerospace Engineering Sciences University of Colorado, Boulder Riccardo.Balin@Colorado.EDU

On Fri, Dec 21, 2018 at 10:50 PM Cameron Smith notifications@github.com wrote:

I'd suggest modifying chef to use unit element weights. If you replace the call to Parma_WeighByMemory here:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L40

with setWeights (defined here:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L78-L83

you should be good to go.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCOREC/core/issues/173#issuecomment-449506882, or mute the thread https://github.com/notifications/unsubscribe-auth/AKCQcDJL425fMCb7kSZY_yWSrzZCFALqks5u7VeJgaJpZM4WLVHf .

cwsmith commented 5 years ago

You're welcome.

Until we have data that indicates which is better for solver, weighted (with proper stats) vs unit weights, I vote that this change goes into the core/develop branch.

We can likely reduce that 31% imbalance with parma by sacrificing some of the edge cut or element imbalance. If you remove the following lines: https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L142-L144 Parma vtx>elm will run if the pre-phasta balancer is set to parma or parma-gap.

rickybalin commented 5 years ago

I tried partitioning the last step (8k to 32k) also with your latest suggestion of commenting out lines 142-144 in phPartition.cc, and there is a difference. The vertex imbalance is reduced, and the region imbalance grows relative to what I reported in the previous message. Here are the statistics:

PARMA_STATUS postGap entity imbalance : 1.06 1.10 1.14 1.21 1534 mesh reordered in 0.757935 seconds 1535 max vertex load imbalance of partitioned mesh = 1.465074 1536 ratio of sum of all vertices to sum of owned vertices = 1.381705 1537 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.205626

Using the same scripts and input files to get both cases to run with PHASTA, the PHASTA files created with the chef executable that does not have lines 142-144 commented run fine, but the PHASTA files created with the executable with lines 142-144 commented do not run and cause PHASTA to give NaN at the first flow solve. I am not sure yet what is responsible for the issue with the latest case, but I thought I would report this.

Also, to make sure I understand, weighted partitioning (with proper stats) is the normal chef performance, without any of your suggestions? If this is the case, I can state that PHASTA prefers significantly the partition created by chef with your suggestions of using unit weights.

Best regards,


Riccardo Balin Doctoral Candidate Smead Aerospace Engineering Sciences University of Colorado, Boulder Riccardo.Balin@Colorado.EDU

On Thu, Dec 27, 2018 at 12:49 PM Cameron Smith notifications@github.com wrote:

You're welcome.

Until we have data that indicates which is better for solver, weighted (with proper stats) vs unit weights, I vote that this change goes into the core/develop branch.

We can likely reduce that 31% imbalance with parma by sacrificing some of the edge cut or element imbalance. If you remove the following lines:

https://github.com/SCOREC/core/blob/ec067988558e75229cf0b8a0ab1b1c49f5bec4fa/phasta/phPartition.cc#L142-L144 Parma vtx>elm will run if the pre-phasta balancer is set to parma or parma-gap.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCOREC/core/issues/173#issuecomment-450136664, or mute the thread https://github.com/notifications/unsubscribe-auth/AKCQcEGIEzzOhurYDXYlOwtiESpOM8Ezks5u9LPUgaJpZM4WLVHf .

cwsmith commented 5 years ago

Thank you for the update.

Chef, as it exists currently in the SCOREC/core master branch (https://github.com/SCOREC/core/commit/8fdf6a77afbbe1108ec4b0527d5fe4bce1598942), uses non-unit weights when partitioning and balancing but the partition stats reported by parma do not account for those weights. We could easily modify chef to report stats that account for the non-unit weights.

The question is, does solver run faster on a mixed mesh with the non-unit weights or unit weights.

Good to see parma vtx>elm reduced the vertex imbalance on the mixed mesh. It sounds like the partition may not be valid though... at least for the phasta pre-processor. Would you please create a separate issue for that discussion?

rickybalin commented 5 years ago

Thanks for the clarification.

To answer the question regarding the solver, it is faster with the partition created with unit weights. With 32k parts for both, I was able to get around 12 time steps in an hour with the unit weights, and not even one step with the non-unit weights.

I will make a new issue for the case that seems to create a non-valid partition for PHASTA.


Riccardo Balin Doctoral Candidate Smead Aerospace Engineering Sciences University of Colorado, Boulder Riccardo.Balin@Colorado.EDU

On Fri, Jan 4, 2019 at 3:37 PM Cameron Smith notifications@github.com wrote:

Chef, as it exists currently in the SCOREC/core master branch (8fdf6a7 https://github.com/SCOREC/core/commit/8fdf6a77afbbe1108ec4b0527d5fe4bce1598942), uses non-unit weights when partitioning and balancing but the partition stats reported by parma do not account for those weights. We could easily modify chef to report stats that account for the non-unit weights.

The question is, does solver run faster on a mixed mesh with the non-unit weights or unit weights.

Good to see parma vtx>elm reduced the vertex imbalance on the mixed mesh. It sounds like the partition may not be valid though... at least for the phasta pre-processor. Would you please create a separate issue for that discussion?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCOREC/core/issues/173#issuecomment-451460649, or mute the thread https://github.com/notifications/unsubscribe-auth/AKCQcG4D0hCBJPXVjsT_66kXd3KYScPlks5u_2cegaJpZM4WLVHf .

KennethEJansen commented 3 years ago

Do either of you know the status of the commenting of lines 142-144 in phPartition.cc? I see that Cameron requested that it be in another ticket so I will go looking for that.

That said, as long as matchedNodeElmReader is off of the main branch we will have to track whether it is fixed in develop, fixed in addNfath, or both.

If it has been fixed in develop or master, let me know and I will try to pull those changes in because we appear to need them for my new mixed topology cases.

Note I am seeing evidence that it is matchedNodeElemReader that is creating a partition that Chef struggles to recover from. If I do matchedNodeElemReader in serial, Chef can partition it fine but even 4 parts of MatchedNodeElemReader creates something that graph struggles with. I will cross check to see if my code has the fixes above (I would have hoped that they were committed but maybe they were not) to see if that helps.

KennethEJansen commented 3 years ago

Given that Riccardo's PHASTA run NaN-ed out I thought perhaps I would try to set weights. This code seems to work ` 21 namespace ph { 22 23 void setWeight(apf::Mesh m, apf::MeshTag tag, int dim) { 24 double w = 1.0; 25 apf::MeshEntity e; 26 apf::MeshIterator it = m->begin(dim); 27 int nverts =1; 28 apf::Downward verts; 29 while ((e = m->iterate(it))){ 30 int dimEnt=getDimension(m,e); 31 if(dimEnt==3) { 32 nverts = m->getDownward(e, 0, verts); 33 if(nverts==8) w=6.0; 34 } 35 m->setDoubleTag(e, tag, &w); 36 } 37 m->end(it); 38 } 39 40 apf::MeshTag setWeights(apf::Mesh m) { 41 apf::MeshTag* tag = m->createDoubleTag("parma_weight", 1); 42 setWeight(m, tag, 0); 43 setWeight(m, tag, m->getDimension()); 44 return tag; 45 } 46 kjansen@viz003: /projects/tools/Models/BoeingBump/LES_DNS_Meshing/FPSMixTopo/MGEN2/Chef/8-4m-Chef $ mpirun -np 8 /projects/tools/SCOREC-core/buildMT/test/chef 2>&1 | tee -a output.GraphWeightHex6NoParma PUMI Git hash 2.2.0 PUMI version 2.2.0 Git hash 37f9f9e4c3f356b42dda25f0a81bf23282516724 "../outModel.dmg" and "../LES_Periodic.spj" loaded in 0.001406 seconds mesh ../../mner/outMesh/ loaded in 14.751190 seconds number of tet 2682228 hex 449280 prism 0 pyramid 5760 mesh entity counts: v 908160 e 4609449 f 6801856 r 3137268 planned Zoltan split factor 2 to target imbalance 1.010000 in 11.894761 seconds mesh expanded from 4 to 8 parts in 12.185144 seconds mesh migrated from 4 to 8 in 119.303178 seconds mesh reordered in 15.000522 seconds max vertex load imbalance of partitioned mesh = 1.039241 ratio of sum of all vertices to sum of owned vertices = 1.071798 max region (3D) or face (2D) load imbalance of partitioned mesh = 1.727526 getGrowthCurves: warning! not implemented for MDS mesh Found tag fathers2D generated output structs in 21.907134 seconds solution written in 2.101823 seconds mesh bz2:mdsMesh_bz2/ written in 16.316866 seconds geombc file written in 2.393878 seconds

As the log implies, I have parma turned off and letting graph balance elements with hexes getting a weight of 6 while tets and pyramids (else) keep a weight of 1. From the size of the coords file (which is the coordinates of each node on a part) this is a max/avg of 1.04.