GEOS-ESM / GEOSctm

Fixture for chemical transport scenarios
Apache License 2.0
0 stars 2 forks source link

Running GEOS CTM #15

Open JulesKouatchou opened 5 years ago

JulesKouatchou commented 5 years ago

I cloned GEOS CTM and was able to compile it. The ctm_setup script did not properly create the experiment directory because it was still referring to the old configuration (Linux/ instead of install/). I fix the ctm_setup file. The code is crashing during the initialization steps because it cannot create the grid. The code is failing on Line 9193 of MAPL_Generic.F90:

call ESMF_ConfigGetAttribute(state%cf,gridname,label=trim(comp_name)//CF_COMPONENT_SEPARATOR//'GRIDNAME:',rc=status)
VERIFY_(status)

I can quickly understand why there is a problem: the label should only be 'GRIDNAME:'.

I checked a couple of CVS tags I have and could not locate any MAPL version similar to the one in the git repository. I am wondering if MAPL has to be updated before GEOS CTM can run.

JulesKouatchou commented 5 years ago

I want to add that when I run the stand alone DynAdvCores, I do not have an issue. My guess is that the fillz subroutine is not called.

kgerheiser commented 5 years ago

Runs fine in non-debug mode with Gfortran + OpenMPI. Maybe it's a compiler bug.

So, we have:

ifort + Intel MPI (debug): works (with a memory leak somewhere)

ifort + Intel MPI (non-debug): divide by zero error in tp_core

ifort + MPT: Some sort of memory access issue in fiilz

Gfortran + OpenMPI: Works

kgerheiser commented 4 years ago

I've found that if you remove the entries associated with hord_* in fvcore_layout.rc that causes the crash to switch to tp_core. Though, I somehow suspect this isn't related to the actual bug.

I have been using TotalView and its memory debugger to catch the problem, but it yields nothing useful. I can see that memory is corrupted (at least according to TotalView), but if you add a print in the code the values are fine.

And due to the problem only being present when optimization is enabled when you step through the code it jumps around, so it's hard to see what's happening.

kgerheiser commented 4 years ago

I have found that both crashes are during the AVX instruction vdivpd, and that turning off vectorization using the -no-vec flag when compiling allows it to run.

mathomp4 commented 4 years ago

@kgerheiser I'm slowly getting caught up now on missed stuff. Is it only one file that needs -no-vec? I'd prefer not to do all of FV without it since, well, vectorization gave us some speed up.