geoschem / gchp_legacy

Repository for GEOS-Chem High Performance: software that enables running GEOS-Chem on a cubed-sphere grid with MPI parallelization.
http://wiki.geos-chem.org/GEOS-Chem_HP
Other
7 stars 13 forks source link

[BUG/ISSUE] Run crashes in MAPL when running full chemistry simulation at c360 #59

Closed hongjianweng closed 4 years ago

hongjianweng commented 4 years ago

@lizziel @yantosca When I use 900 or more cores for c360 simulation, it shows that:

ERROR: cannot create ESMF regridder for var in file. This may be because source grid is too coarse for number of cores. Try reducing number of cores for the simulation. ./MainDataDir/IODINE/v2017-09/CH3I_monthly_emissions_Ordonez_2012_COARDS.nc

And I try to reduce number of cores, such as 720 cores, but the module will cut off, which I think it may be the memory issue. Therefore, I wonder are there suggesting number of cores for running c360? Or can offer finer source file which will limit the increasing of number of cores for the simulation?

hongjianweng commented 4 years ago

I use the version 12.6.0

sdeastham commented 4 years ago

I think you're running into a known issue where - for the version of MAPL which is currently in GCHP - there's a minimum input resolution for simulations running with a certain number of cores. I think the technical reason is that ESMF doesn't want to regrid data where the domain covered by one core is less than the size of a single grid cell. @lizziel is better acquainted with this issue but I believe that the best solution we currently have is to manually regrid low-res input files to a higher resolution (this can be easily accomplished using e.g. GCPy). As you noted, going to a lower number of cores isn't necessarily a workable solution due to memory/resource/time constraints.

lizziel commented 4 years ago

See also https://github.com/geoschem/gchp/issues/33. I just added some additional information to clarify the source of the problem.

I would like to add the caveat that in very recent testing I ran into this issue and suspected that the file printed by the error message may not have been the file that needs regridding to higher resolution. Look at ExtData.rc at the line for the file that is printed to the log. Also look above and below and check the resolutions of all of them. If one is obviously low res relative to the others, that is the one to regrid. This issue will be fixed in GCHP 13.0.0, but if there is also an issue with the current error handling print we will fix it in a 12.7.z version. Please do report back on what you find if you pursue regridding to higher res as a fix.

lizziel commented 4 years ago

@hongjianweng Do you have an update on this issue? Were you able to get c360 running on your system, and if yes, what did you do?

lizziel commented 4 years ago

I am testing GCHP with the newest version of MAPL available and am able to reproduce this regridding failure at c360. The problematic variable is definitely in CH3I_monthly_emissions_Ordonez_2012_COARDS.nc which is surprising since it is 1x1, higher resolution than other data arrays regridded before it. This appears to be a new problem and not the one previously discussed (the older issue does appear to be fixed in newer MAPL).

I am in touch with GMAO about figuring out a fix for this.

lizziel commented 4 years ago

The issue with the Ordonez files at high resolution is resolved with new files that will be default in GCHP 12.7.2. The original files had latitude range [-90,89] that likely caused them to be treated as regional files in MAPL rather than global files, although it is not entirely understood why that would be a problem at any resolution. The new files have latitude shifted by 0.5 degrees to have range [-89.5,89.5] and this appears to resolve the issue.

To apply the fix with older versions of GCHP you can either turn off the Ordonez inventory of HEMCO_Config.rc, or edit the ExtData.rc config file to use the new files, located at ExtData/HEMCO/IODINE/v2020-02. These files are not yet synced on ComputeCanada but will be by the time of the 12.7.2 release. They will be available on the Harvard ftp server today.