Open jhamman opened 8 years ago
@wietsefranssen - I'm finally catching up on where you are at with all of this.
I would do the decomposition based on the outlet grid cell. It probably means you will need to gather from other processes but that is how it goes.
It may also be possible to use a round-robin decomposition, like what we use in VIC right now, and use a MPI_Reduce
to sum the ring arrays.
A broader thought, unless you need the parallelization for memory, RVIC convolution step may be a good candidate for shared memory openmp
application (see #522 for some discussion on this.).
Not sure where this stands, but I would separate the domain composition for VIC from the decomposition for the routing. That allows more flexibility and I don't think that there are any obvious reasons that the two need to be linked. The routing model only needs access to a very small subset of the VIC variables, so we should not have to distribute all the VIC data structures.
I think this corresponds to option 2) in the original post
@bartnijssen - @wietsefranssen and I spoke a few weeks ago and decided to go with an even simpler approach for now. What we've been seeing in RASM is that RVIC can keep up on a single processor, so that's what we're going to try here. Until we're sure we need this optimization, this seems like the prudent way to proceed, rather than building in a bunch of infrastructure before we know if it is needed.
If we find that we need to parallelize the RVIC convolution for speed, an openmp shared memory reduction may work really well. If its a memory issue, MPI can be done and we know how to do it.
I and @isupit have build in the simpler approach @jhamman and I spoke about a few weeks ago. It is working and works quite fast, but as soon as we go to larger grids (approx 70000 active gridcells) the run times are getting much higher. This is mainly because we sometimes have all gridcells assigned as outlet points. The current approach is rather inefficent in case you use all cels at outlet and @isupit and I are currently looking to an alternative appoach.
Because of my leave, I add @isupit to the discussion.
Can we do this in stages? There is no reason not to accept a working implementation while we wait for something better and more general. My vote would be to include the single processor implementation that @wietsefranssen and @isupit currently have and then work on on that can handle very large domains efficiently.
Can you please explain why the pull request has not been accepted yet. and please let us know if some changes are needed the code.
In the meanwhile Iwan and I, will procude an example dataset and the documentation about the RVIC extention.
@jhamman : See comment / question from Wietse above
@wietsefranssen - I'll make comments on https://github.com/UW-Hydro/VIC/pull/231.
The core RVIC routing scheme has been implemented in VIC_image. The next step is to implement it with MPI.
There are two options to implement the routing with MPI:
Modified from personal communication with @bartnijssen, @wietsefranssen, and @isupit