MPAS-Dev / MPAS-Tools

MPAS Tools Repository
Other
37 stars 63 forks source link

Add bandwidth-minimising mesh sorting routine #515

Closed dengwirda closed 1 year ago

dengwirda commented 1 year ago

This PR adds a mesh sorting routine mpas_tools.mesh.creation.sort_mesh that can be used to reindex an MPAS mesh file to bring the cells, edges and vertices into a more cache-local sequence. This undoes the somewhat 'random' ordering introduced by the mesh conversion tools, and may be desirable re: efficiency for downstream dycore + coupler applications.

The reordering is implemented via a reverse cuthill-mckee approach --- minimising the bandwidth of the cell-to-cell adjacency graph and then permuting the edges and vertices to match. Sorting both culled and unculled meshes is supported, as well as meshes containing additional variables, which are reindexed according to their underlying nCells, nEdges or nVertices dimensions.

The mesh sorter should be callable as a cmd-line tool:

python sort_mesh.py \
--mesh-file=path+name-to-mpas-file-for-sort \
--sort-file=path+name-to-sorted-output-file

and/or inline:

from mpas_tools.mesh.creation.sort_mesh import sort_mesh
mesh = xarray.open_dataset("mpas-file-name")
sort_mesh(mesh)

Tested on pm-cpu using a QU30km MPAS-O standalone config., which successfully spun-up to 90 days after a sort of the culled mesh. Sorting improves the locality of the various remapping arrays (verticesOnCell, edgesOnVertex, etc, etc) with all showing smooth distributions.

edgesOnCell=unsorted: eoc_unsorted

edgesOnCell=issorted: eoc_issorted

dengwirda commented 1 year ago

Just keeping this in draft for now @xylar --- before going too far I think we should check whether this reindexing will solve the locality issues wrt. the E3SM coupler and I've reached out to the team re: benchmark cases.

xylar commented 1 year ago

@dengwirda. beautiful! Keep me posted.

xylar commented 1 year ago

Picasso, by the way, is turning over in his grave. The unsorted way it clearly more cubist.

xylar commented 1 year ago

One more thing: It would be good to add the entry point (redundantly, I know) to this file: https://github.com/MPAS-Dev/MPAS-Tools/blob/master/conda_package/recipe/meta.yaml along with a test to make sure you can at least get the help message: https://github.com/MPAS-Dev/MPAS-Tools/blob/master/conda_package/recipe/meta.yaml#L140 Or if you want, you could test it properly like one of these: https://github.com/MPAS-Dev/MPAS-Tools/blob/master/conda_package/recipe/meta.yaml#L103

I added the entry point and a test like I'm suggesting to the 0.22.0rc2 recipe on conda-forge: https://github.com/conda-forge/mpas_tools-feedstock/pull/77/files#diff-f3725a55bf339595bf865fec73bda8ac99f283b0810c205442021f29c06eea9a

xylar commented 1 year ago

I tested things out in https://github.com/MPAS-Dev/compass/pull/657 and everything worked well. I need to visualize the edgesOnCell, etc. to make sure I'm seeing something similar to the plots above but the mechanics seem to work well.

@dengwirda would you like to make the changes I mentioned above? If so, that would be my preference but I'm also happy to make them if you don't have time.

xylar commented 1 year ago

@dengwirda, using Paraview, I looked at all the following fields from the ECwISC mesh I tested earlier:

cellsOnCell, edgesOnCell, verticesOnCell, cellsOnVertex, cellsOnEdge, edgesOnVertex, verticesOnEdge, edgesOnEdge

and they look great to me. I won't post a bunch of identical-looking plots here to prove it.

So I think this is good to go as soon as my earlier clean-up comments are addressed.

xylar commented 1 year ago

@dengwirda, I'm seeing this error when sorting the RRS6to18 mesh:

Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.xylar/chrysalis/mambaforge/envs/compass_rrs6to18/bin/compass", line 33, in <module>
    sys.exit(load_entry_point('compass', 'console_scripts', 'compass')())
  File "/gpfs/fs1/home/ac.xylar/compass/add_rrs6to18/compass/__main__.py", line 63, in main
    commands[args.command]()
  File "/gpfs/fs1/home/ac.xylar/compass/add_rrs6to18/compass/run/serial.py", line 206, in main
    run_single_step(args.step_is_subprocess)
  File "/gpfs/fs1/home/ac.xylar/compass/add_rrs6to18/compass/run/serial.py", line 167, in run_single_step
    _run_test(test_case, available_resources)
  File "/gpfs/fs1/home/ac.xylar/compass/add_rrs6to18/compass/run/serial.py", line 417, in _run_test
    _run_step(test_case, step, test_case.new_step_log_file,
  File "/gpfs/fs1/home/ac.xylar/compass/add_rrs6to18/compass/run/serial.py", line 468, in _run_step
    step.run()
  File "/gpfs/fs1/home/ac.xylar/compass/add_rrs6to18/compass/ocean/mesh/cull.py", line 165, in run
    cull_mesh(with_critical_passages=True, logger=logger,
  File "/gpfs/fs1/home/ac.xylar/compass/add_rrs6to18/compass/ocean/mesh/cull.py", line 241, in cull_mesh
    _cull_mesh_with_logging(
  File "/gpfs/fs1/home/ac.xylar/compass/add_rrs6to18/compass/ocean/mesh/cull.py", line 374, in _cull_mesh_with_logging
    dsCulledMesh = sort_mesh(dsCulledMesh)
  File "/gpfs/fs1/home/ac.xylar/chrysalis/mambaforge/envs/compass_rrs6to18/lib/python3.10/site-packages/mpas_tools/mesh/creation/sort_mesh.py", line 44, in sort_mesh
    cell_fwd = reverse_cuthill_mckee(cell_del2(mesh)) + 1
  File "/gpfs/fs1/home/ac.xylar/chrysalis/mambaforge/envs/compass_rrs6to18/lib/python3.10/site-packages/mpas_tools/mesh/creation/sort_mesh.py", line 131, in cell_del2
    for edge in range(np.max(topolOnCell)):
  File "<__array_function__ internals>", line 200, in amax
  File "/gpfs/fs1/home/ac.xylar/chrysalis/mambaforge/envs/compass_rrs6to18/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2820, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/gpfs/fs1/home/ac.xylar/chrysalis/mambaforge/envs/compass_rrs6to18/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

I haven't had time to debug at all but wanted to see if you have any guesses.

I'm rerunning and writing out the file before the crash, so hopefully I can reproduce the crash more easily and point you to the file so you can, too.

Update something unrelated to this PR is going wrong before calling sort_mesh() such that all arrays are empty. So please disregard.

dengwirda commented 1 year ago

Thanks for the review @xylar, I believe I got all of those changes in but let me know if you need anything further.

xylar commented 1 year ago

@dengwirda, it looks excellent! Let me just make another test build with both this and the fixes I have been making in #514. If my testing goes well, I'll approve and merge this.