MPAS-Dev / MPAS-Model

Repository for MPAS models and shared framework releases.
231 stars 307 forks source link

Improve check on deallocation of hash table in mpas_geotile_mgr_finalize #1189

Closed mgduda closed 3 weeks ago

mgduda commented 3 weeks ago

This PR improves the check on the deallocation of the geotile manager hash table in mpas_geotile_mgr_finalize, resolving apparent issues with this deallocation under some conditions.

In some cases (typically with the Intel oneAPI compilers), parallel remapping of static fields in the init_atmosphere core will fail with the message

  ERROR: Problem deallocating the geotile hash table

for some MPI ranks. There is apparently a problem in deallocating the hash member of mpas_geotile_mgr_type instances in mpas_geotile_mgr_finalize.

This PR improves the checks on the deallocation of mgr % hash in the mpas_geotile_mgr_finalize routine, making them more stringent. With the modifications to the deallocation checks, the deallocation errors no longer occur, suggesting that they were entirely spurious.

mgduda commented 3 weeks ago

@weiwangncar I think you'll be able to reproduce the issue on Derecho by starting with the following commands:

module reset
module load intel/2024.0.2
module load parallel-netcdf
export INTEL_COMPILER_TYPE=ONEAPI

(changing the export to a setenv if you use csh or tcsh) before compiling the master branch with

make intel CORE=init_atmosphere

Then, you can try to run the static interpolation stage with 16 MPI ranks using the files in /glade/derecho/scratch/duda/pr1189/.

weiwangncar commented 3 weeks ago

@mgduda It looks like this is a problem with Intel oneapi - I didn't encounter it before because I often just use regular Intel/ifort. Using Intel oneapi, the same error appears with default intel/2023.2.1, as well as intel/2024.0.2. And the fix works for both.

mgduda commented 3 weeks ago

@weiwangncar If the fix in this PR looks good to you, could you approve this PR?