Closed twsearle closed 1 year ago
Apologies for abandoning this attempt to fix this issue! I have had another go in the last few weeks to address our small MPI issues. I have a much smaller PR that I think will help a bit, so I will close this one. If it goes well I will add another PR to handle the no-halo checkerboard/equal_regions case, and then a final PR for the halo size >1
issues. https://github.com/ecmwf/atlas-orca/pull/15
Description
The ORCA grids have halo cells included in the grid. This makes building halos in the usual way problematic, as the mesh already contains duplicate nodes.
To get around this, I am attempting to construct the halo during the mesh generation, and set "halo_locked" to prevent failures in BuildHalo.cc.
The mesh generator creates a "surrounding rectangle" around all indices that should be on a given partition. I have extended this to include the additional nodes out to the halo width. Later, I include nodes that are within halo width of the current partition as nodes within the mesh.
Testing
When I attempt to plot the nodes used in a halo by other partitions I seem to have a band around the perimeter that is not included. For example, partition 1 with 6 processors and a halo of 2 gives me the following image:
I would expect to have the edge nodes to have a halo of 1, and then the next nodes in to have a halo of 2, instead there is a band of halo 0 around the outside edge. I have compared these images with the halo variable that is included as part of the gmsh info file output, and I am not exactly sure if this is an issue! It is possible everything is working correctly, but if there is a better way to test please let me know.
I have also tested this by reading and writing ORCA data from file into and out of atlas running on 4 and 2 processors with the equal-regions partitioner. I am doing this by converting the
ij
indices back to the native orca indices (including the orca halo points because these are also included in the output files):This reveals an additional discrepancy at some of the grid fold/seam points near the north poles I think. For example for ORCA2 the relevent points are:
These points result in missing data that was not being transferred by the halo swap:
I can resolve these points using the changes prototyped in test_atlas_orca_mesh::build_periodic_boundaries, and turning off the generic buildParallelFields and buildPeriodicBoundaries by setting
mesh.nodes().metadata().set( "parallel", true ); mesh.metadata().set( "periodic", true );
:Remaining issues