MPAS-Dev / MPAS-Model

Repository for MPAS models and shared framework releases.
231 stars 307 forks source link

Fix issue in destroying halo groups when there are multiple domain instances #1185

Closed mgduda closed 3 weeks ago

mgduda commented 4 weeks ago

This PR fixes a potential issue in destroying halo exchange groups when there are multiple domain instances.

When there exist multiple domain instances, and atm_destroy_halo_groups and mpas_deallocate_domain are called in a particular order for the domains, the config_halo_exch_method module variable in the mpas_atm_halos module may be invalid, leading to an error when destroying halo groups. This results in as the error message

  ERROR: Failed to destroy halo exchange groups.

in the MPAS-Atmosphere log file.

Consider a situation in which we have two domain instances, domain_0 and domain_1. If the atm_build_halo_groups routine and the atm_destroy_halo_groups plus mpas_deallocate_domain routines are called in "last-in, first-out" order, i.e.,

  call atm_build_halo_groups(domain_0, ierr)

  call atm_build_halo_groups(domain_1, ierr)

  ...

  call atm_destroy_halo_groups(domain_1, ierr)
  call mpas_deallocate_domain(domain_1)

  call atm_destroy_halo_groups(domain_0, ierr)
  call mpas_deallocate_domain(domain_0)

the module variable config_halo_exch_method would be first set to point to memory owned by domain_0 and then re-set to point to memory owned by domain_1 in the calls to atm_build_halo_groups. Upon calling atm_destroy_halo_groups for domain_1 followed by a call to mpas_deallocate_domain, the config_halo_exch_method pointer would become invalid, leading to an error when calling mpas_destroy_halo_groups for domain_0.

Since the domain instance is passed as an argument to atm_destroy_halo_groups routine, rather than using a module variable set when halo exchange groups were built, we can simply retrieve config_halo_exch_method from the domain argument, thereby avoiding the original problem.

amstokely commented 3 weeks ago

I rebuilt MPAS bundle and ran the MPAS-JEDI CTests using these changes. They fixed the problem we were having with config_halo_exch_method being deallocated too early.

mgduda commented 3 weeks ago

@amstokely Thanks for testing this PR with MPAS-JEDI. We'll call that good enough for an approval!

mgduda commented 3 weeks ago

I just pushed a fix to the commit message: "first-in, first-out" should be "last-in, first-out".