ENH: RVIC routing and MPI

jhamman commented 8 years ago

The core RVIC routing scheme has been implemented in VIC_image. The next step is to implement it with MPI.

There are two options to implement the routing with MPI:

The execution of VIC is organized per basin, meaning basins are distributed among the cores.
- Consequences:
  - It is assumed that there is no inter basin information exchange.
  - Currently this is no problem because normal routing takes only place within a basin. Also in the near future is there not a big issue: other schemes to be implement like the irrigation/dam scheme of Ingjerd Haddeland also assumes no inter basin exchange.
  - However, we need to take into account that inter basin exchange experiments (irrigation from one basin into another) will be much more difficult to implement in the future.
  - The model can only run with one core per basin. This may cause a long execution time on large basins and/or on runs with a fine resolution (many grid cells per basin).
  - It can be quite complicated to decide how to distribute over the cores. A less optimal combination can cause a large drop in overall execution speed.
The execution of VIC is independent on the mapping of the grid cells over the cores, meaning cells are "randomly" distributed over the cores, independent to which basin they belong.
- Consequences:
  - Information exchange between grid cells needs to take place at every time step. For the cores on different nodes (as is the case in our situation, we use a HPC) this cannot be done via shared memory. Special provision should be organized to exchange information for cores on different nodes. This exchange of information between grid cells per time step may thus take extra time (depends also on the hardware)
  - On the other hand, there is no restriction on the number of cores per basin and there is more flexibility in case we want inter-basin information exchange (in certain experiments)

Modified from personal communication with @bartnijssen, @wietsefranssen, and @isupit

jhamman commented 8 years ago

@wietsefranssen - I'm finally catching up on where you are at with all of this.

I would do the decomposition based on the outlet grid cell. It probably means you will need to gather from other processes but that is how it goes.

It may also be possible to use a round-robin decomposition, like what we use in VIC right now, and use a MPI_Reduce to sum the ring arrays.

A broader thought, unless you need the parallelization for memory, RVIC convolution step may be a good candidate for shared memory openmp application (see #522 for some discussion on this.).

bartnijssen commented 8 years ago

Not sure where this stands, but I would separate the domain composition for VIC from the decomposition for the routing. That allows more flexibility and I don't think that there are any obvious reasons that the two need to be linked. The routing model only needs access to a very small subset of the VIC variables, so we should not have to distribute all the VIC data structures.

I think this corresponds to option 2) in the original post

jhamman commented 8 years ago

@bartnijssen - @wietsefranssen and I spoke a few weeks ago and decided to go with an even simpler approach for now. What we've been seeing in RASM is that RVIC can keep up on a single processor, so that's what we're going to try here. Until we're sure we need this optimization, this seems like the prudent way to proceed, rather than building in a bunch of infrastructure before we know if it is needed.

If we find that we need to parallelize the RVIC convolution for speed, an openmp shared memory reduction may work really well. If its a memory issue, MPI can be done and we know how to do it.

wietsefranssen commented 8 years ago

I and @isupit have build in the simpler approach @jhamman and I spoke about a few weeks ago. It is working and works quite fast, but as soon as we go to larger grids (approx 70000 active gridcells) the run times are getting much higher. This is mainly because we sometimes have all gridcells assigned as outlet points. The current approach is rather inefficent in case you use all cels at outlet and @isupit and I are currently looking to an alternative appoach.

Because of my leave, I add @isupit to the discussion.

bartnijssen commented 8 years ago

Can we do this in stages? There is no reason not to accept a working implementation while we wait for something better and more general. My vote would be to include the single processor implementation that @wietsefranssen and @isupit currently have and then work on on that can handle very large domains efficiently.

wietsefranssen commented 8 years ago

Can you please explain why the pull request has not been accepted yet. and please let us know if some changes are needed the code.

In the meanwhile Iwan and I, will procude an example dataset and the documentation about the RVIC extention.

bartnijssen commented 7 years ago

@jhamman : See comment / question from Wietse above

jhamman commented 7 years ago

@wietsefranssen - I'll make comments on https://github.com/UW-Hydro/VIC/pull/231.

UW-Hydro / VIC

ENH: RVIC routing and MPI #549