[Other] Looking for ideas to improve load balancing with VTKMeshGenerator

francoishamon commented 2 years ago

Hello, I need help with a simulation on an unstructured mesh with 1,915,000 cells (prims) that I load into GEOSX using the VTKMeshGenerator. The mesh looks good and the simulation goes well when I use a small number of MPI ranks, but I am struggling with the load balancing between MPI partitions:

When I use 16 MPI ranks and look at the partitions, VTK does not seem to do a good job at balancing the load between partitions
When I use 32 MPI ranks, I get 8 ranks with 0 nodes, which stops the simulation with develop. With feature/klevzoff/vtk-mesh-update, I can leave the mesh generation step, which is great, but GEOSX does not like empty MPI partitions.

The domain/mesh/cell shapes are not unreasonable, so I am looking for ideas to overcome this issue. Is there a way to control MPI partitioning inside the VTK framework that I can try/implement?

I can send an image of the MPI partitions on Slack to anyone interested in this issue.

XL64 commented 2 years ago

We are using metis as a partitionner ? I experienced empty partition with metis in other codes, we used to fix it by moving one element to empty domains. Also we used Scotch (https://gitlab.inria.fr/faverge/scotch/-/tree/master) which is less prone to create empty partition. Last releases were on Inria gforge which is not available anymore, I can ask the team member where they put their new releases, there are also tags in the git repo.

klevzoff commented 2 years ago

I've been thinking about this myself recently. VTK uses a simple octree aligned with global axis to split the mesh, which should produce reasonably load-balanced partitions for "normal"-looking meshes (e.g. those resembling a box), but can produce somewhat unexpected results for certain configurations (e.g. the "staircase" tetrahedral mesh). Empty partitions seem entirely possible, for example with slanted/dipping domains.

My own problems with it come from the fact that it can sometimes produce partitions that are not contiguous in the TPFA sense, i.e. have cells that are not connected to the rest through a face. This leads to some problems when creating an agglomeration-based coarse mesh for the multiscale method. I can "downgrade" it to using node-based connectivity to define local agglomerates and it works, but the quality of basis functions and convergence of the method take a hit. I'm sticking to PAMELA import for that reason.

It should also be noted that that type of partitioning, while fast and useful for visualization and data processing, is pretty bad for dynamic simulation, since it's not minimizing MPI communication (edge-cut).

I think a possible long-term solution is to use a graph-based algorithm (ParMETIS?) to refine the initial partitioning and redistribute the mesh one more time. This can happen through a contribution to VTK or fully on GEOSX side, with the latter probably being a more realistic scenario. This is not an easy task, but VTK has some built-in tools for redistribution. Internally it uses a library called diy2 for this and provides compatible facilities like data serialization for unstructured grid datasets. We might be able to take advantage of it. The closest example is here.

XL64 commented 2 years ago

Ok it seems I was a bit off-topic. I didn't know how VTK partitionning worked.

klevzoff commented 2 years ago

Ok it seems I was a bit off-topic.

You were not :) Your comment might be extremely relevant if we still have problems like empty ranks after repartitioning attempt. At the moment we just need to come up with a plan to even get to that point.

klevzoff commented 2 years ago

I think a possible long-term solution is to use a graph-based algorithm (ParMETIS?) to refine the initial partitioning and redistribute the mesh one more time. This can happen through a contribution to VTK or fully on GEOSX side, with the latter probably being a more realistic scenario. This is not an easy task, but VTK has some built-in tools for redistribution. Internally it uses a library called diy2 for this and provides compatible facilities like data serialization for unstructured grid datasets. We might be able to take advantage of it. The closest example is here.

To be more concrete, I think the implementation could be:

Start with a mesh already distributed with vtkRedistributeDataSetFilter.
Generate ghost cells using vtkGhostCellsGenerator.
Build a parallel graph by walking local+ghosted cell/node connectivity (require shared nodes >= 3 for TPFA-type graph) and using cell/node global IDs that are currently already being generated.
Call ParMETIS (or PT-SCOTCH) to produce on each rank a local vector of target ranks for each local cell
Split the local mesh into a vtkPartitionedDataSet (a collection of sub-meshes) based on the target rank vector - there doesn't seem to be an existing filter for that (or I can't find it), but it can be done using a series of vtkExtractCells applications.
Use diy::all_to_all (similar to this) to redistribute the partitions to target ranks.
Merge final partitions on each rank using vtkAppendFilter

@AntoineMazuyer if you have any contacts on the VTK team that could take a quick peek and either greenlight this plan or tell us reasons why it won't work, that would be great.

AntoineMazuyer commented 2 years ago

Hi @klevzoff,

Sure thing.

One question : why don't we use ParMETIS at step one ?

klevzoff commented 2 years ago

@AntoineMazuyer sure, we probably could do that. My impression was that in order to use ParMETIS, you need to give it a distributed graph, so we need to split the mesh anyway (unless we read in a pre-partitioned mesh from a .pvtu), although we don't have to physically redistribute the entire dataset. If VTK exposes the tools to octree-partition the mesh without distributing it, that would probably be optimal.

GEOS-DEV / GEOS

[Other] Looking for ideas to improve load balancing with VTKMeshGenerator #1823