importing swapm implementation of MPI_AlltoallV from PIO

worleyph commented 8 years ago

For high resolution runs of ACME and CESM using large MPI process counts, the existing MPI algorithms in the rearrange_ routine can be inefficient. The swapm variant of the MPIAlltoallV operator was ported from PIO1 and modified to work in the MCT environment. The option to call this was then added to rearrange . For the short term, rearrange_ only calls swapm (and not MPIAlltoallV for the existing MPI-1 point-to-point communication algorithm) and uses fixed swapm parameter options that are anticipated to be resonable choices for most situations. Long term, the routines calling rearrange should be modified to allow the user to specify an MPI algorithm and communication protocol.

Update: Based on performance experiments on Titan, using the swapm option with handshaking enabled for all calls to rearrange is not as efficient as using this option only in calls to rearrange from sMatAvMultSMPlus . Changed logic to implement this restriction. While it may be useful in other locations as well, these locations are still to be determined.

Fixes #30

[BFB]

worleyph commented 8 years ago

I have tested variants of this in low and high resolution cases of the ACME water cycle, and will now verify that this particular version works in these examples. Note that I "kind of" modified the code imported from PIO1 to match the coding style of MCT, but not really. Someone who knows what is important should go through and clean this up.

Currently the target logic is that the swapm routines will be enabled when both the AlltoAll optional parameter is present and set to .true. and when any of the swapm optional parameters (HandShake, ISend, MaxReq) are present. I am not a fan of this personally, but the alternatives were either to add another optional parameter (altAlltoAll? SwapM?) or to change AlltoAll to be an integer with three legal values (0: point-to-point; 1: MPIAlltoallV, 2: swapm). The later is probably my personal choice, but this affects all calls to rearrange, so someone else needs to make this decision.

Note that the above describes the target logic. Right now rearrange_ has swapm and specific swapm settings hard-coded, since there is no support for setting these in the calling routines.

For background, in at least one high res. case, swapm decreases CPL:RUN cost by a factor of 5 at high process counts. This is critical for near term production runs.

worleyph commented 8 years ago

@rljacob , I'm running more experiments - to see if it makes sense to apply swapm globally or not. Unfortunately, my 60000 core benchmark job runs around once a day. There is an existing mct_usealltoall namelist parameter that I also wnat to see if I can use as part of this - would this be set in user_nl_cpl?

rljacob commented 8 years ago

Yes mct_usealltoall is set in user_nl_cpl and the default is false.

This has passed the system test. Want me to wait before integrating?

worleyph commented 8 years ago

This is what we need for ACME for the water cycle case, for the short term at least. You'll have to decide if it is suitable for a public release.

rljacob commented 8 years ago

I thought you wanted to add some more code that uses the usealltoall flag. If not, then this is ok by me to merge to the MCT master.

worleyph commented 8 years ago

No - I just wanted to check out that mct_usealltoall worked, and to see if it did anything useful (since it is used only in certain places in MCT). In my experiments it caused performance to be worse, so whatever the motivation was originally is not valid in out experiments, at least not on Titan. Please go ahead and merge then.

MCSclimate / MCT

importing swapm implementation of MPI_AlltoallV from PIO #38