geoschem / GCHP

The "superproject" wrapper repository for GCHP, the high-performance instance of the GEOS-Chem chemical-transport model.
https://gchp.readthedocs.io
Other
23 stars 25 forks source link

Starting stretch-grid simulation still fails with 'Factories not equal' error #318

Closed Twize closed 1 year ago

Twize commented 1 year ago

Name and Institution (Required)

Name: Tyler Wizenberg Institution: University of Toronto

Confirm you have reviewed the following documentation

Description of your issue or question

Hi! My colleagues and I are trying to configure a C48 2x stretch-grid simulation over Mexico City (so target res of C96), but we are unable to start the simulation without encountering the following error:

pe=00047 FAIL at line=06123 MAPL_Generic.F90 <Factories not equal>

I have seen #287 and I have applied the new re-gridding approach from GCPy 13.3 mentioned here https://github.com/geoschem/gcpy/pull/190 and following the guide here: https://gcpy.readthedocs.io/en/stable/Regridding.html. In this case, I have re-gridded a C48 restart file from my other GCHP simulation, however, no matter what I seem to do, I will still encounter the 'Factories not equal' error immediately at run-time. I have noticed that the target longitude in GCPy is East-West longitude (-180°,+180°), while GCHP requires a longitude in terms of [0°,360°), so I tried using a target longitude for the regridding of both -99°E, and also 261°, but both still yield the same error. The re-gridded restart appears to have the correct attributes:

// global attributes: :STRETCH_FACTOR = 2.f ; :TARGET_LAT = 19.2f ; :TARGET_LON = -99.f ;

To further rule out the choice of latitude and longitude as the source of the issue, we also tried using the exact same coordinates for Bermuda as was used in the tutorial, but it still leads to the error. Additionally, I have also applied the most recent fix (https://github.com/geoschem/MAPL/pull/28) that is going into GCHP v14.2 to my GCHP v14.1.1 core codes and re-compiled successfully, but it did not solve the issue.

I'm not sure what else to try, any help would be appreciated!

Software versions: GCHP v14.1.1 GCPy (for regridding): v13.3 Gridspec: v0.1.0 Sparselt: v0.1.3 Compiler: GCC v10.3.0 OpenMPI: v4.1.1 NetCDF Fortran: v4.6.0 Cmake: v3.17.3 ESMF: v8.3.1

Relevant files: Run settings: setCommonRunSettings.sh.zip GCHP.rc: GCHP.rc.zip Slurm output log: slurm_output_9527534.txt GCHP.log: gchp.log

The restart file is a bit too large to attach (5GB), but I can put it in a drop box if needed.

lizziel commented 1 year ago

Hmm, I'll take a look at this today.

lizziel commented 1 year ago

Hi @Twize, I am able to reproduce this issue with the full chemistry simulation. Interestingly the simulation is fine with the transport tracer simulation, which is what I used for my testing. I'll work on a MAPL fix for this.

Twize commented 1 year ago

Hi @lizziel, thanks for investigating this!

lizziel commented 1 year ago

Actually, I take my last comment back. I had forgotten to update cap_restart and so was starting on hour 1 rather than hour 0, and therefore not using the restart file I just generated using GCPy. It was using an older restart that had been generated using the pre-fix code a couple months ago.

Switching to use the restart file I just generated with GCPy fixes the issue in the benchmark simulation. This makes me wonder about your code. Do you have it on GitHub for me to look at?

lizziel commented 1 year ago

Also please provide an ncdump of just the header of your restart file. ncdump restart_file -h > restart.ncdump.txt

Twize commented 1 year ago

Switching to use the restart file I just generated with GCPy fixes the issue in the benchmark simulation.

Dang, ok... I guess it must be on my end then.

This makes me wonder about your code. Do you have it on GitHub for me to look at?

Which codes exactly? I don't have any GCHP-related codes on Github at the moment. The ones I use for re-gridding with GCPy? For that I was just using the individual functions from gridspec and GCPy following the tutorial: gridspec-create gcs 48 + gridspec-create sgcs 96 -s 2.0 -t 19.2 -99.0 then: ESMF_RegridWeightGen --source c48_gridspec.nc --destination c96_s2d00_txgrvpzrbpgrv_gridspec.nc --method conserve --weight c48_to_c96_stretched_weights.nc and finally: python -m gcpy.regrid_restart_file --stretched-grid --stretch-factor 2.0 --target-latitude 19.2 --target-longitude -99.0 GEOSChem.Restart.20130101_0000z.c48.nc4 c48_to_c96_stretched_weights.nc4 GEOSChem.Restart.20130101_0000z.c48.nc4

Attached is the ncdump of the restart file header: restart.ncdump.txt

I can provide the files from grid-spec and ESMF_RegridWeightGen as well if you'd like.

Twize commented 1 year ago

I should mention, one thing that did confuse me about the GCPy re-gridding tutorial was the fact that they have a base resolution of C48, choose a stretch-factor of 4.0, but then have a target resolution as C120 when they do the re-gridding. Would the target resolution not be approximately 4xC48 ~ C192?

lizziel commented 1 year ago

I think the problem you are having is that you do not have the correct base resolution in your regridding. Your grid resolution in setCommonRunSettings.sh is c48. Therefore you should use gridspec-create sgcs 48 -s 2.0 -t 19.2 -99.0 and make a stretched weights file called c48_to_c48_stretched_weights.nc for use in the call to gcpy.regrid_restart_file. As is, you are creating a restart file for c96. The resolution over your target lat/lon would be c192.

I think the docs are incorrect. I recently updated them to be more user-friendly, including changing the example, but must not have gotten all the places I needed to change. I'll fix that in the next version.

Twize commented 1 year ago

I think the problem you are having is that you do not have the correct base resolution in your regridding. Your grid resolution in setCommonRunSettings.sh is c48. Therefore you should use gridspec-create sgcs 48 -s 2.0 -t 19.2 -99.0 and make a stretched weights file called c48_to_c48_stretched_weights.nc for use in the call to gcpy.regrid_restart_file. As is, you are creating a restart file for c96. The resolution over your target lat/lon would be c192.

Ah this makes much more sense! Yeah, in that case, the GCPy re-gridding example was a bit misleading.. I will give this a try and report back.

Twize commented 1 year ago

Hi @lizziel, okay so I first tried the approach exactly as you outlined using a longitude of -99.0°, and still encountered the Factories not equal error. However, I went back and then re-did the regridding again a second time using a longitude in [0°,360°) coordinates, so TARGET_LON = 261°, and it worked!

So I think it was a two-step issue. The first time around, I was accidentally re-gridding to an incorrect base resolution of C96, and then using an incorrect longitude value in [-180°,180] notation instead of [0°,360°). I assume that GCPy used to use longitude in [-180°,180] notation but it's now been changed? I did find it a bit odd that the tutorial was using [-180°,180°] longitude notation while the setCommonRunSettings.rc file requires it in terms of 0-360°. Is this correct? or do negative longitudes work for you when you re-grid?

All in all, yeah the re-gridding tutorial on this page: https://gcpy.readthedocs.io/en/stable/Regridding.html#regridding-files-gchp needs an overhaul because both issues were pretty much a result of the tutorial being a bit confusing.

lizziel commented 1 year ago

Hi @Twize, something still isn't adding up since I used exactly the same target lat/lon as your original post. I did not have to change -99.0 to 261. Could you write out all the steps that you took?

Twize commented 1 year ago

Hi @lizziel hmm thats odd. Sure, the steps I used (which worked) are as follows:

Generate grid specifications using gridspec: gridspec-create gcs 48 + gridspec-create sgcs 48 -s 2.0 -t 19.2 261.0 then generate regridding weights: ESMF_RegridWeightGen --source c48_gridspec.nc --destination c48_s2d00_txgrvpzrbpgrv_gridspec.nc --method conserve --weight c48_to_c48_stretched_weights.nc

and finally do the restart regridding: python -m gcpy.regrid_restart_file --stretched-grid --stretch-factor 2.0 --target-latitude 19.2 --target-longitude 261.0 GEOSChem.Restart.20130101_0000z.c48.nc4 c48_to_c48_stretched_weights.nc4 GEOSChem.Restart.20130101_0000z.c48.nc4

If a negative longitude is included in the Restart file, is there an automatic conversion done from E-W longitude to 360 degree longitude? To me the issue seems like there was an internal comparison between the TARGET_LON provided in setCommonRunSettings.py (which = 261.0) and the one in my Restart before (=-99.0), which was failing somehow. But if the Restart contains TARGET_LON = 261.0, it passes. Just a hunch though, not sure why it wouldn't work for me but it does for you.

lizziel commented 1 year ago

Did you try using your -99.0 restart with a setCommonRunSettings.sh that also had -99.0? If those don't match it would not work. I know setCommonRunSettings.sh says to use [0,360) but I have not found that to be necessary. If the results are identical I will remove that requirement.

Twize commented 1 year ago

Did you try using your -99.0 restart with a setCommonRunSettings.sh that also had -99.0? If those don't match it would not work. I know setCommonRunSettings.sh says to use [0,360) but I have not found that to be necessary. If the results are identical I will remove that requirement.

I did not try a longitude of -99.0 in the past in the setCommonRunSettings.sh file because I assumed the 0-360 degree longitude was a hard requirement for GCHP, but we will test this and let you know if it works!

Twize commented 1 year ago

Hi @lizziel, it appears to work the same with both TARGET_LON = 261.0 and TARGET_LON = -99.0 and the result looks identical.

Total column ozone: image (1)

So maybe the [0,360) longitude requirement in setCommonRunSettings.sh could be removed?

lizziel commented 1 year ago

Hi @Twize, thanks for giving that a try. Yes, I will update the documentation accordingly.

Twize commented 1 year ago

@lizziel as an aside, I went to try and plot the re-gridded restart files just to check that they were consistent when using the -180°,180° longitude vs. 360°, but we realized that the lon and lat inside the GCHP restarts are labelled as 'degrees East' and 'degrees North', but are not actually true latitude and longitude values. Are they indexes of some sort? If so, how would one plot a GCHP restart?

For example, the latitude field from a GC Classic restart file:

variables:
    double lat(lat) ;
        lat:long_name = "Latitude" ;
        lat:units = "degrees_north" ;
        lat:axis = "Y" ;
        lat:bounds = "lat_bnds" ;
    double lon(lon) ;
        lon:long_name = "Longitude" ;
        lon:units = "degrees_east" ;
        lon:axis = "X" ;
        lon:bounds = "lon_bnds" ;
data:

 lat = -89.5, -88, -86, -84, -82, -80, -78, -76, -74, -72, -70, -68, -66, 
    -64, -62, -60, -58, -56, -54, -52, -50, -48, -46, -44, -42, -40, -38, 
    -36, -34, -32, -30, -28, -26, -24, -22, -20, -18, -16, -14, -12, -10, -8, 
    -6, -4, -2, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 
    32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 
    68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 89.5 ;

and a GCHP restart file:

variables:
    double lon(lon) ;
        lon:long_name = "Longitude" ;
        lon:units = "degrees_east" ;
    double lat(lat) ;
        lat:long_name = "Latitude" ;
        lat:units = "degrees_north" ;
data:

 lat = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
    21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 
    39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 
    57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 
    75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 
    93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 
    109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 
    123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 
    137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 
    151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 
    165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 
    179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 
    193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 
    207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 
    221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 
    235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 
    249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 
    263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 
    277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288 ;

Using the single-panel plotting from GCPy doesn't work properly with a GCHP restart file it seems, but also trying to plot with Xarray also yields a very weird looking result because the values are not true latitude and longitude (and don't span the full range).

Edit: nevermind, I just saw the other recent issue #319 which mentions this!

Twize commented 1 year ago

@lizziel I am going to go ahead and close this issue because I think we have gotten to the bottom of it. Thanks again for your help!