MPAS-Dev / MPAS

Repository for private MPAS development prior to the MPAS v6.0 release.
Other
4 stars 0 forks source link

Framework support for halo exchange data structure reuse #1453

Closed Larofeticus closed 6 years ago

Larofeticus commented 6 years ago

Add 3 subroutines to support reusable halo exchange group data structures:

mpas_dmpar_exch_group_build_reusable_buffers:

mpas_dmpar_exch_group_reuse_halo_exch:

mpas_dmpar_exch_group_destroy_reusable_buffers:

No impact on pre-existing functionality. By itself this also has no impact on performance.

Larofeticus commented 6 years ago

I have a fork branch with the ocean/develop changes to use this feature ready to go.

By manually splicing this PR and the core changes together I was able to run and get time integration performance numbers.

Before: avg 13.782122s -- (13.75133s, 13.75635s, 13.82115s, 13.71621s, 13.86557s) After: avg 12.45950s -- (12.33658s, 12.57770s, 12.41332s, 12.48657s, 12.45473s)

Gain: 9.6%

Larofeticus commented 6 years ago

I keep the ocean core change here: https://github.com/Larofeticus/MPAS/tree/exchange_reuse_ocean_core

Larofeticus commented 6 years ago

If I change the configuration from 33x2 to 33x4 then it gets a few more % faster. avg 12.06371s -- (12.36817s, 12.05414s, 11.95627s, 11.94428s, 11.99571s)

Gain 12.5%

mark-petersen commented 6 years ago

I tested this PR plus ocean changes that test them, i.e. I merged in https://github.com/Larofeticus/MPAS/tree/exchange_reuse_ocean_core. That is here: https://github.com/mark-petersen/MPAS/tree/merge_bill_exchange_reuse_ocean_core

That passed the MPAS-Ocean nightly regression suite with intel and gnu, including bit-for-bit with previous, bit-for-bit restart, partition, and threading tests.

mark-petersen commented 6 years ago

@ndkeen I added this PR to the framework on all three cores, as well as the ocean changes to use them, here: https://github.com/ACME-Climate/ACME/tree/mark-petersen/mpas/reuse_halo_exchange

This compiles and completes:

./create_test SMS_Ln9.T62_oQU240.GMPAS-IAF.cori-knl_gnu
./create_test SMS_Ln9.T62_oQU240.GMPAS-IAF.cori-knl_intel

Please test for performance when convenient. Note the timer to watch for comparison is the ocean timer: se btr vel for split explicit barotropic velocity. The sub-timers may also be helpful, like btr se subcycle loop and se halo subcycle. If those portions of the code show a speed-up, it would be worth making the changes elsewhere.

ndkeen commented 6 years ago

Mark: Are you saying there is a different branch I should try? Can you paste the branch? mark-petersen/mpas/reuse_halo_exchange ?

mark-petersen commented 6 years ago

@ndkeen , it’s a bit confusing, so here is a summary. You’ve been testing 1. Now there is also 2. Bill, does 2 include the changes in 1?

IMPROVEMENT 1: Reduce thread barriers in mpas_dmpar halo packing routines Mpas PR: https://github.com/MPAS-Dev/MPAS/pull/1449 ACME branch: https://github.com/ACME-Climate/ACME/tree/mark-petersen/mpas/reduce_thread_barriers

IMPROVEMENT 2: reuse halo exchange data structure MPAS framework PR (this one) https://github.com/MPAS-Dev/MPAS/pull/1453 then we will need an MPAS-Ocean PR to use those. These are both on an ACME branch here: https://github.com/ACME-Climate/ACME/tree/mark-petersen/mpas/reuse_halo_exchange

Yes, 2 is a new one to test in ACME.

ndkeen commented 6 years ago

OK, I will refer to them as the "reduce" and "reuse" branch. It's not intuitive how to get the actual branch name from the links above. mark-petersen/mpas/reuse_halo_exchange is the "reuse" branch. I'm trying it now.

Note, looking at other runs from last night, I am still not seeing any overall improvement using the "reduce" branch. As soon as I have time, will look at the raw timer files (or can point you to them?).

Larofeticus commented 6 years ago

This PR also includes the packing/thread changes from my earlier PR

ndkeen commented 6 years ago

fyi, posting some results here.
https://acme-climate.atlassian.net/wiki/spaces/PERF/pages/316309799/Performance+results+after+changes+to+MPAS+framework+for+hires+G+cases+on+cori-knl

mark-petersen commented 6 years ago

@Larofeticus: The atmosphere folks (@mgduda, @climbfuji) would like to test this. Could you comment some instructions here on how to take a single halo exchange call and convert it to the reusable buffer version? I.e., change in subroutine name, arguments, and additional init/finalize routine.

If you could push your latest ocean and sea ice branches that include the framework commits in this PR plus the example changes for that core, and point us to those branches, that would also be a helpful template to follow.

climbfuji commented 6 years ago

I won't have time until at least the weekend to test this, I think - if @mgduda can do that instead for the atmosphere core, that would be great ... apologies!

mark-petersen commented 6 years ago

@climbfuji no problem on the timing. I added your name as a notification only. Thanks for letting us know.

Larofeticus commented 6 years ago

@mark-petersen refer to PR 1458 I just put up for the modifications to ocean that use this code. Unfortunately it's mixed with some other cleanup I did, so also see the diff on the CICE prototype I sent to noel: `diff components/mpas-cice/model/src/core_cice/shared/mpas_cice_velocity_solver.F cori05% diff components/mpas-cice/model/src/core_cice/shared/mpas_cice_velocity_solver.F mpas_cice_velocity_solver_BILL.F 37a38,39

. > character (len=*), parameter :: subcycleGroupName = 'subcycleFields' . > . 1950a1953,1957 . > call mpas_dmpar_exch_group_create(domain, subcycleGroupName) . > call mpas_dmpar_exch_group_add_field(domain, subcycleGroupName, 'uVelocity') . > call mpas_dmpar_exch_group_add_field(domain, subcycleGroupName, 'vVelocity') . > call mpas_dmpar_exch_group_build_reusable_buffers(domain, subcycleGroupName) . > . 1959a1967,1968 . > call mpas_dmpar_exch_group_destroy_reusable_buffers(domain, subcycleGroupName) . > . 2044,2045c2053,2055 . < call MPAS_dmpar_field_halo_exch(domain, 'uVelocity') . < call MPAS_dmpar_field_halo_exch(domain, 'vVelocity') . --- . > !call MPAS_dmpar_field_halo_exch(domain, 'uVelocity') . > !call MPAS_dmpar_field_halo_exch(domain, 'vVelocity') . > call mpas_dmpar_exch_group_reuse_halo_exch(domain, subcycleGroupName)`

Larofeticus commented 6 years ago

Ok, I don't know why that didn't work. The line break change to the pack barrier reduction is why it said there is a conflict. I used the resolve conflicts button, changed it, pushed a green button and it acted like it was fixed. That commit looks like it should fix it, but github says the conflicts are still present?

mgduda commented 6 years ago

@Larofeticus or @mark-petersen Would either of you be willing to clean up the commit history of this PR and to rebase it on the latest develop branch? Also, I think it would be worth including a "hello, world" example of using the reusable buffer routines in the merge commit message (i.e., the first comment in this PR), and to perhaps remove the remarks about needing to place these changes in ocean/develop to support future changes. In my opinion, only information that is of general interest should be included in the actual commit (and merge commit) messages.

mark-petersen commented 6 years ago

@Larofeticus I rebased your changes onto the develop branch and squashed to a single commit. This is identical to your previous code, except for a few white characters clean-up.

mgduda commented 6 years ago

@Larofeticus Thanks for the commit to remove the unused mpi_ierr variable -- I'll squash this into 85f2541 to keep the commit history tidy.

Regarding the changes in 80974b1 , I'd propose to move these to a separate PR, pending input from @mark-petersen who last modified the affected code in PR #1464 , since those changes are not related to the addition of the three new routines in this PR. Not having actually tested the changes in 80974b1 , though, I'm a little suspicious: it looks like commListPtr may not actually be associated in the lines like commListSize = commListPtr % commListSize after your commit.

Larofeticus commented 6 years ago

Yeah I should have done it somewhere else earlier.

As for the correctness, there are no entries to the set of interface calls that don't go through the pack_buffer_field. It strikes me as silly that anywhere in any code is permitted or able to call a halo exchange with unassociated comm lists; i'm suspicious that the real issue is a bug in the ocean init stream that caused the segfault in the first place.

mark-petersen commented 6 years ago

@Larofeticus I agree that a halo exchange should not be called with an unassociated comm list. But that behavior was there before #1459, and we have to deal with one issue at a time. I just made a new issue #1477 to record it. Let's remove 80974b1 from this PR to keep it clean.