Open juliasloan25 opened 1 year ago
Write a function which takes a space
and gives you an object which lets you map both:
a. tempest remap index (tidx
) to the set of (global node index (gidx
), i, j) which correspond to that tidx
b. the (global node index, i, j) to tempest remap index
b. can just be stored in an Nq*Nq*Nelem
array, a will need some sort of ragged array structure (or you could initially use a Dict of arrays until you figure it out.
To do the full distributed remap:
Thanks for the revision, Julia! The plan looks great! It may not be a huge leap to include the data layout optimization to reduce the need for communication (i.e., setup MPI distribution so that the points from the same region of the target and source grids looked after by the same PID), but maybe let's revisit this once we have the local to local map. 🚀
@sriharshakandala I've just changed the title of this to be consistent with the OKRs. Please feel free to modify the content, once you have a chance to take this over.
@Sbozzolo here is the issue that @juliasloan25 started, and @sriharshakandala agreed to take over in this Q. It is a bit out of date, so it might be more efficient to catch up offline. Hopefully we can share the regridding infrastructure when reading files and regridding model fields! It would be great to have your thoughts on this! 🙏
Scattered thoughts:
One the differences in Land is that weights might be different for different variables because different datasets might be defined over different coordiantes (but this in not a big deal, and we don't really have to support this I think).
Right now, Land is fully using ClimaCoreTempestRemap, from generating weights (remap_weights) to applying them (apply_remap).
This is where I am working on reading input maps for Land (the PR doesn't have a description yet, but it is almost ready, with plenty of documentation in the modules). The assumptions are that we want to decouple file processing from remapping (IO can be very expensive so maybe we will do it threaded/chunked, remapping will probably be on the GPU)
Do you have a sense of what the interface will look like?
- Right now, Land is fully using ClimaCoreTempestRemap, from generating weights (remap_weights) to applying them (apply_remap).
Do you have a sense of what the interface will look like?
FWIW, ClimaCoupler also currently uses CCTR to apply the weights (see apply_remap
call here).
I didn't make a plan for the interface - most of the work that was done so far was prototyping to try to get something working first.
Note that hdwrite_regridfile_rll_to_cgll
(which uses TR's apply_remap
) should only be used for lightweight input data, for example regridding stationary or infrequently updated (e.g. monthly) files etc, like we are doing in the coupler and used to do in land). I agree that for ILAMB (or remapping model fields on the fly) this is no longer sufficient.
For remapping fields on the fly (not done in our current AMIP, but will need this of coupling with ClimaOcean), the plan is:
generate_map
depends on TR (this only happens during initialization, so using TR here should not be a performance bottleneck). The reason why we want this is that it generates conservative and consistent weights. If we use linear interpolation, e.g. for fluxes, we will break energy / mass / momentum conservation. This is less of an issue when reading boundary conditions from files (because we can't keep track of the exact conservation in those cases anyway) remap!
(i.e., the map multiplied by field) is independent of TR, and last time we measured it, it was quite performant. However, it is not currently implemented for MPI or GPU (which is what this issue was addressing). Once we have this, we can use these online regridding functions and it should be fast and conservative. For ILAMB there are two possible pathways
remap!
function (but this would require addressing the parallel matrix multiplication sooner, rather than later). My impression is that this is not far from being done, what are your thoughts @sriharshakandala ? I think that we want to follow the same steps you outlined for remapping on the fly in Land eventually (the non-conservative remapping goes from spectral to rectangular, not the other way around at the moment).
I don't think we need this super soon. Getting to the point where we are reading everything from data while doing integersting global runs is not around the corner (but it is our target).
In this, it would be good to have a description of what we envision the capabilities of remap!
are going to be. What I think I would like is something like remap(weights, input_data) -> remapped_field
, with MPI/GPU compatibily and where input_data
is a rectangular array read from file (e.g., surface albedo). This allows ClimaLand to spawn a different thread to keep reading input_data, and remap them when needed.
Yeah, that's more or less what the plan was. :) And good to know the priorities and more use cases for this. @sriharshakandala I'll let you drive it from here.
Purpose
We want to implement distributed matrix multiplication to enable parallel online regridding in the coupler.
Cost/Benefits/Risks
People and Personnel
Components
There are two major steps involved in regridding:
We can divide the process up into even more granular steps:
MPI.scatterv
to send info from root process to all processes (note processes may be responsible for varying numbers of weights, so we can't usescatter
). b. We can't exchange a sparse matrix object type, so instead exchange 3 arrays: nonzero values, row indices, and column offsets. c. We may also need the root process to send the number of nonzero weights each process is responsible for.Implementation phases
1. Initial implementation of distributed regridding - DONE
2. Second implementation - store only local information in each process's
LinearMap
remap!
by converting unique indices to local indices ingenerate_map
and storing then inLinearMap
.LinearMap
on each process containing only the information relevant for that process.3. Optimized implementation using only the necessary source data
Inputs
Results and deliverables
Functions for distributed matrix multiplication in ClimaCoreTempestRemap, and tests for these functions.
Tests include:
Current status
As of ClimaCore PR #1107 (distributed regridding v1), we are able to regrid from serial spaces to distributed spaces. Note that this regridding only works when the source and target meshes are collocated.
ClimaCore PR #1192 cleans up this implementation a bit by storing local indices in the
LinearMap
object itself (which is constructed only once), rather than computing them in theremap!
function (which gets called multiple times). Also see https://github.com/CliMA/ClimaCore.jl/issues/1195 for more information.A concrete example of the distributed regridding is partially implemented in ClimaCore PR #1259. This has been tested on 2 processes when remapping from 2 to 3 elements, and appears correct when compare to serial regridding results. Future work could test this implementation with more than 2 processes, and with more elements than just 2 -> 3.
The next steps are to rework our implementation so that each process uses only its local information and communicated information to perform the remapping. This is different from the distributed regridding v1, which does most of the work on the root process and then broadcasts it. Some of the logic for the distributed approach using MPI can be found in the concrete example, such as performing the weight/source data multiplication on the source side and exchanging these products then recombining them on the receive side. This next implementation should allow us to be able to perform regridding from a distributed source space to a distributed target space.
Task Breakdown And Tentative Due Date
[x] Minimum working example of the initial implementation (no information exchange) with tests for correctness [7 Apr] - ClimaCore PR #1107
[x] In
generate_map
, convert unique indices (target_idxs
) to local indices usingtarget_global_elem_lidx
. Instead of usingtarget_global_elem_lidx
to convert global to local indices inremap!
, do this ingenerate_map
and store the local indices inLinearMap
. [21 Apr] - ClimaCore issue #1191, PR #1192[x] Write out a simple concrete example (e.g. remapping from 2 to 3 elements with 4 nodes each) using distributed spaces and MPI to exchange information. This will help us understand how to use MPI functions (i.e.
scatter
,scatterv
) for our case and then generalize this approach. [18 Aug] - ClimaCore PR #1259[ ] Adapt the concrete example to use a weight matrix generated by TempestRemap. This will likely primarily involve index conversions. [22 Sept]
[ ] Super-halo exchange: Create a mapping from sparse indices to the neighbor pid and local index on that process of source data. Use this with our buffer struct to generate source data on the distributed space and send only the relevant data to each process, as opposed to sending all source data. [6 Oct]
[ ]
On each process, store only the local components ofLinearMap
data (source_idxs
,target_idxs
,weights
,row_indices
). This will allow us to iterate over only the truncated (local) weights inremap!
. To do this, we need to create send and receive buffers on each process and use MPI'sscatterv
function. [29 Sept][ ]
Super-halo exchange - implement a buffer struct containing send and receive data to generate source data on the distributed space and send all source data to all processes. Previously we have been generating the source data on a serial space. [13 Oct][ ] Implement an example with a Buildkite driver [20 Oct]
[ ] Add thorough documentation - perhaps a tutorial or complete API docs [20 Oct]
Timeline delayed due to Julia OOO March 20-Apr 3
Timeline delayed due to break as of May 18
Proposed Delivery Date
20 Oct 2023
SDI Revision Log
generate_map
using only distributed spaces, we realized this would be quite complicated since we need to interface with TempestRemap and keep track of the global unique indices of data and weights. To get around this, we're constructing mappings between indices and will use those to index into the distributed data locally.Components
and revised timelines to more accurately reflect our plan after returning to this project.