GEOS-ESM / GOCART

GOCART Aerosol model including process library and framework interfaces (MAPL, NUOPC, and CCPP)
Apache License 2.0
13 stars 14 forks source link

MAPL 2.34.0 failed in ufs weather model #207

Open junwang-noaa opened 1 year ago

junwang-noaa commented 1 year ago

We are trying to update MAPL library from 2.23.1 to MAPL 2.34.0 in ufs-weather-model. However we got following error message in GOCART:

pe=00098 FAIL at line=03053 Base_Base_implementation.F90 pe=00098 FAIL at line=00685 SU2G_GridCompMod.F90 pe=00098 FAIL at line=01817 MAPL_Generic.F90 pe=00098 FAIL at line=00193 BaseProfiler.F90 <Timer does not match start timer > pe=00098 FAIL at line=01838 MAPL_Generic.F90 pe=00098 FAIL at line=00161 Aerosol_GridComp.F90 pe=00098 FAIL at line=01817 MAPL_Generic.F90

I tried the latest GOCART version acc574ff8 in the develop branch (https://github.com/GEOS-ESM/GOCART/tree/develop), I got the same error. The code ran fine when switching back to MAPL 2.23.1. The ESMF library is: 8.4.1b07.

May I ask if anything needs to be updated to use MAPL 2.34.0? Thanks

mathomp4 commented 1 year ago

@junwang-noaa I think your libraries are fine, the line it is dying on is:

    call ESMF_AttributeGet(grid, name='GridType', value=grid_type, _RC)
    if(trim(grid_type) == "Cubed-Sphere") then

This came in between 2.23.1 and 2.34.0 in changes from @aoloso. Before it was:

    if (im_world*6==jm_world) then

So it's like it doesn't know your grid is a cubed-sphere grid. We might need to ping @weiyuan-jiang and @bena-nasa to see if maybe you need to set something in a file somewhere?

bena-nasa commented 1 year ago

@junwang-noaa As Matt said, the issue is that rather than relying on a silly hack (that the global size of the 2nd dimension of the grid was 6 times the 1st dimension) to detect the presence of a cubed-sphere grid, we changed the logic in that library. We now say, you must add the attribute to the grid that explicit tells it what type of grid this is so the procedure can take the appropriate action as the hack was no longer tenable with other changes to the GEOS model.

The grid you are using in the UFS application, however it is created, that is passed to GOCART clearly does not have this attribute set. The solution is to add the appropriate attribute to the grid ('Gridtype' as the key, and value is 'Cubed-Sphere'), that is used in GOCART so that this routine can know what the grid type is.

@weiyuan-jiang since you know how to build UFS can you take it from here? I've no idea where the grid comes from within UFS so can't really be of further advice.

bbakernoaa commented 1 year ago

I think that we need to come up with a more robust way to do this. For instance, what if we run this on a regional grid (a single tile). We should not expect that the only solution is the global cube sphere.

junwang-noaa commented 1 year ago

Thank you all for looking into this issue. @weiyuan-jiang I transferred the run directory to Orion at:

/work/noaa/stmp/junwang/gocart/rt_189024/cpld_control_p8_mixedmode

It has all the configure .rc files. Thanks

weiyuan-jiang commented 1 year ago

@junwang-noaa I don't have the permission to that folder

junwang-noaa commented 1 year ago

@weiyuan-jiang Please try it again. Thanks

weiyuan-jiang commented 1 year ago

@junwang-noaa I still cannot access gocart directory. To build mapl_v2.34.0, where can I load the esmf8.4.0? I got error CMake Error at /apps/cmake-3.22.1/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find ESMF: Found unsuitable version "8.3.0", but required is at least "8.4.0" (found /work/noaa/epic-ps/hpc-stack/libs/intel/2022.1.2/intel-2022.1.2/impi-2022.1.2/esmf/8.3.0b09/lib, )

junwang-noaa commented 1 year ago

@weiyuan-jiang Sorry, please try again. The library team installed the library on acorn. UFS failed with ESMF 840 due to a bug in ESMF 840. The ESMF 841b07 works in UFS, GOCART runs with MAPL v2.23.1. Just when we try MAPL v2.34.0, we got error message. We don't have the library installed on Orion yet. Please let me know if you need to run tests on Orion, and I can see if EPIC team can install them.

bena-nasa commented 1 year ago

I think that we need to come up with a more robust way to do this. For instance, what if we run this on a regional grid (a single tile). We should not expect that the only solution is the global cube sphere.

The whole reason we changed the logic to what is there now is for MORE robubstness. We were doing some open work where the component had a "grid" was still a cube-sphere in the sense that it had a copy of the local cubed-sphere domain. In which case we still want to go through this code path since the cells are still great circles on all 4 sides and same search algorithm can should be used.

A fully robust, efficient grid agnostic implementation of this (additive point binning to an arbitrary grid) where the cell boundaries may not be great circles (like a tripolar grid for example), is beyond the scope of what MAPL can do but I fully agree that is what we need. ESMF does have an action item I believe to someday implement this, a generic point binning that is additive to a grid given a set of points as an extension of existing regridding methods.

weiyuan-jiang commented 1 year ago

@junwang-noaa I think the problem is this line https://github.com/GEOS-ESM/GOCART/blob/984fc494abfcd95ff3c2ced49dc0ab11176b8510/ESMF/UFS/Aerosol_Cap.F90#L342

The temporary fix should be

call cap % cap_gc % set_grid(grid, lm=nlev, grid_type ="Cubed-Sphere", _RC)

I am wondering how to pass the type info into the cap options.

weiyuan-jiang commented 1 year ago

Oh, Sorry, I spoke to soon. Let me verify again . Yes, the change should work. Does UFS always work with cubed-sphere grid ? @junwang-noaa

junwang-noaa commented 1 year ago

@weiyuan-jiang Currently UFS is using cubed sphere grid atmosphere model fv3atm, it has 6 tiles for global domain and 1 or multiple domains for regional domains. I am not sure if UFS will always use cubed sphere grid though as people might use other grids if they want to integrate other dycores.

bbakernoaa commented 1 year ago

We also have the regional application. We have to think more than just operational when designing these.

On Sat, Feb 25, 2023 at 5:59 PM Jun Wang @.***> wrote:

@weiyuan-jiang https://github.com/weiyuan-jiang Currently UFS is using cubed sphere grid atmosphere model fv3atm, it has 6 tiles for global domain and 1 or multiple domains for regional domains. I am not sure if UFS will always use cubed sphere grid though as people might use other grids if they want to integrate other dycores.

— Reply to this email directly, view it on GitHub https://github.com/GEOS-ESM/GOCART/issues/207#issuecomment-1445222683, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIUVN2MX5GSFPCJUBFPVN3WZKFE3ANCNFSM6AAAAAAVHAFRV4 . You are receiving this because you commented.Message ID: @.***>

--

Barry Baker

National Oceanic and Atmospheric Administration Air Resources Laboratory Physical Research Scientist Chemical Modeling and Emissions Group Leader NCWCP, R/ARL, Rm. 4204 5830 University Research Court College Park, Maryland 20740 Phone: ‪(301) 683-1395‬

junwang-noaa commented 1 year ago

bbakernoaa I want to confirm that the regional application in UFS also runs on native cubed sphere grid as I explained in the message, and currently UFS does not have a regional fv3atm coupled with GOCART yet, please clarify if your group is working on a different grid or you know what other grid it will be. Also I believe coupling GOCART in regional requires additional work on boundaries, which is beyond the issue here.. Thanks

junwang-noaa commented 1 year ago

@weiyuan-jiang There is an issue on our wcoss2 test platform, it may take some time to verify your fix in UFS. Thanks

bbakernoaa commented 1 year ago

You are correct in that there currently is not any application in regional for GOCART but this doesn’t mean that we shouldn’t make it a possibility. Right now (I’ve tested before and brought it up in our joint meetings with nasa before) gocart cannot except the regional grid because of the way the grid is defined within the cap. We should find a way to more generally define the grid.

On Sat, Feb 25, 2023 at 10:01 PM Jun Wang @.***> wrote:

bbakernoaa https://github.com/bbakernoaa I want to confirm that the regional application in UFS also runs on native cubed sphere grid grid as I explained in the message, and currently UFS does not have a regional fv3atm coupled with GOCART yet, please clarify if your group is working on a different grid or on GOCART coupling in regional domain. Thanks

— Reply to this email directly, view it on GitHub https://github.com/GEOS-ESM/GOCART/issues/207#issuecomment-1445256361, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIUVN2W3WJNQROCFSOJ7QLWZLBQNANCNFSM6AAAAAAVHAFRV4 . You are receiving this because you commented.Message ID: @.***>

--

Barry Baker

National Oceanic and Atmospheric Administration Air Resources Laboratory Physical Research Scientist Chemical Modeling and Emissions Group Leader NCWCP, R/ARL, Rm. 4204 5830 University Research Court College Park, Maryland 20740 Phone: ‪(301) 683-1395‬

junwang-noaa commented 1 year ago

@weiyuan-jiang The fix you provided resolved the issue. Now the test ran successfully in UFS WM. Please let us know if you'd like to have a new MAPL library or you will update GOCART with the fix. If you are going to update MAPL, we will stop installing MAPL 2.34.0. Thanks

mathomp4 commented 1 year ago

@junwang-noaa If I can get my tests done, I hope to put out a MAPL 2.35 today that should have this fix.

mathomp4 commented 1 year ago

@junwang-noaa MAPL 2.35.0 has been released:

https://github.com/GEOS-ESM/MAPL/releases/tag/v2.35.0

I think @weiyuan-jiang can tell you how to use this release for this issue.

weiyuan-jiang commented 1 year ago

@junwang-noaa , Without changing ufs but adding this line to AERO.rc should work

GridType: Cubed-Sphere

junwang-noaa commented 1 year ago

@mathomp4 @weiyuan-jiang Thank you very much for fixing the issue! I will ask our library team to install MAPL 2.35.0 and test in UFS with the change Weiyuan suggested. Will let you know how it goes.

mathomp4 commented 1 year ago

@mathomp4 @weiyuan-jiang Thank you very much for fixing the issue! I will ask our library team to install MAPL 2.35.0 and test in UFS with the change Weiyuan suggested. Will let you know how it goes.

@junwang-noaa You might actually want to wait for 2.35.1 which I'll issue soon. 2.35.0 had a bug in handling monthly history output. My guess is you don't use thatm but you might as well not have a buggy version!

junwang-noaa commented 1 year ago

Sure, we will wait for 2.35.1. Thanks

mathomp4 commented 1 year ago

Sure, we will wait for 2.35.1. Thanks

Whoops. Forgot to update this. 2.35.1 is out now! My guess is 2.35.0 is fine for your runs, but now the bug isn't there. 😄

junwang-noaa commented 1 year ago

@mathomp4 Thanks a lot! @weiyuan-jiang With MAPL 2.35.1, I updated the AERO.rc, please see below:

NX: 4
NY: 24            

# Atmospheric Model Configuration Parameters
# ------------------------------------------
IOSERVER_NODES: 0

DYCORE: NONE

NUM_BANDS: 30

GridType: Cubed-Sphere

Now I got error message:

pe=00086 FAIL at line=01464    MAPL_CapGridComp.F90                     <status=51>
pe=00086 FAIL at line=00342    Aerosol_Cap.F90                          <status=51>

I see the line 342 in Aerosol_Cap.F90 is:

    call cap % cap_gc % set_grid(grid, lm=nlev, _RC)

Do I need to make any change in this line? Thanks

junwang-noaa commented 1 year ago

@weiyuan-jiang I tried the following in the AERO.rc, I got the same error.

GridType: "Cubed-Sphere"

weiyuan-jiang commented 1 year ago

@junwang-noaa I know what happens here. The cf_root is not created when we set_grid. It seems that we need to exchange the two lines. https://github.com/GEOS-ESM/GOCART/blob/df0e5a80865e73a473f80bef1d198fa55f9baa40/ESMF/UFS/Aerosol_Cap.F90#L345-L348

However, the grid should be set at this point. That creates a circular dependence: https://github.com/GEOS-ESM/MAPL/blob/7da78c3664acbe39f328543cb7427502a1a1a9fc/gridcomps/Cap/MAPL_CapGridComp.F90#L625

Maybe the best solution is to just change this line 342 of Aerosol_Cap.F90 and get back to the old set_grid of MAPL?

call cap % cap_gc % set_grid(grid, lm=nlev, grid_type ="Cubed-Sphere", _RC)

@bena-nasa @tclune

junwang-noaa commented 1 year ago

@weiyuan-jiang Thanks for looking into this. I confirm that with the change above in Aerosol_cap.F90 and MAPL 2.34.0, the UFS WM tests ran successfully. Thanks

weiyuan-jiang commented 1 year ago

@junwang-noaa Could you please try this branch? https://github.com/GEOS-ESM/MAPL/tree/fix/wjiang/set_grid_fix ? The cf_root is not created when setting the grid. So you would need to move this line to CAP.rc GridType: Cubed-Sphere

junwang-noaa commented 1 year ago

@weiyuan-jiang I tried your branch and I got this error:

pe=00053 FAIL at line=03053    Base_Base_implementation.F90             <status=57>
pe=00053 FAIL at line=00685    SU2G_GridCompMod.F90                     <status=57>
pe=00053 FAIL at line=01818    MAPL_Generic.F90                         <status=57>
pe=00053 FAIL at line=00193    BaseProfiler.F90                         <Timer <GOCART2G> does not match start timer <SU>>
pe=00053 FAIL at line=01839    MAPL_Generic.F90                         <status=1>
pe=00053 FAIL at line=00161    Aerosol_GridComp.F90                     <Failed to run child component>
weiyuan-jiang commented 1 year ago

@junwang-noaa I have no problem running the new MAPL branch with unchanged ufs. There may be two reasons for the crash: 1) The line GridType: Cubed-Sphere is not added to CAP.rc ( not AERO.rc) 2) The new MAPL branch is not really linked ( need a fresh build)

junwang-noaa commented 1 year ago

@weiyuan-jiang Thanks for looking into the issue. I added the GridType to CAP.rc, instead of AERO.rc, the UFS WM test finished successfully. Please let us know if you have a MAPL release version available.

mathomp4 commented 1 year ago

@junwang-noaa MAPL 2.35.2 is now out:

https://github.com/GEOS-ESM/MAPL/releases/tag/v2.35.2

@weiyuan-jiang Can you inform us how to use your new fixes?

junwang-noaa commented 1 year ago

@mathomp4 I want to confirm that the MAPL 2.35.2 has the fixes in Weiyuan's MAPL branch: https://github.com/GEOS-ESM/MAPL/tree/fix/wjiang/set_grid_fix, right? Thanks.

mathomp4 commented 1 year ago

@mathomp4 I want to confirm that the MAPL 2.35.2 has the fixes in Weiyuan's MAPL branch: https://github.com/GEOS-ESM/MAPL/tree/fix/wjiang/set_grid_fix, right? Thanks.

Yes. It has https://github.com/GEOS-ESM/MAPL/pull/2003 inside

ETA: For Git Flow reasons, @weiyuan-jiang made a new branch against main so we actually used a different branch, but should be the same code.

weiyuan-jiang commented 1 year ago

Yes. That is right

junwang-noaa commented 1 year ago

@weiyuan-jiang I want to confirm with you, when using the new MAPL library, our gocart history files (gocart.inst_aod.20130401_0600z.nc4) now have one additional dimension: lev with value 1 to 4 as shown below:

netcdf gocart.inst_aod.20210323_0600z {
dimensions:
        lat = 361 ;
        lev = 4 ;
        lon = 720 ;
        time = UNLIMITED ; // (1 currently)
variables:
        double lon(lon) ;
                lon:long_name = "longitude" ;
                lon:units = "degrees_east" ;
        double lat(lat) ;
                lat:long_name = "latitude" ;
                lat:units = "degrees_north" ;
        double lev(lev) ;
                lev:coordinate = "N/A" ;
                lev:standard_name = "N/A" ;
                lev:units = "level" ;
        float time(time) ;
                time:begin_date = 20210323 ;
                time:begin_time = 60000 ;
                time:long_name = "time" ;
                time:time_increment = 60000 ;
                time:units = "minutes since 2021-03-23 06:00:00" ;
        float AOD(time, lev, lat, lon) ;
                AOD:_FillValue = 1.e+15f ;
                AOD:add_offset = 0.f ;
                AOD:fmissing_value = 1.e+15f ;
                AOD:long_name = "Total Aerosol Extinction AOT [550 nm]" ;
                AOD:missing_value = 1.e+15f ;
                AOD:regrid_method = "bilinear" ;
                AOD:scale_factor = 1.f ;
                AOD:standard_name = "Total Aerosol Extinction AOT [550 nm]" ;
                AOD:units = "1" ;
                AOD:valid_range = -1.e+15f, 1.e+15f ;
                AOD:vmax = 1.e+15f ;
                AOD:vmin = -1.e+15f ;
...
 lev = 1, 2, 3, 4 ;

Is this what we expect, may I ask what the "lev" means? Thanks

weiyuan-jiang commented 1 year ago

It should be the number of wavelength. But I am not sure why it uses this confusing name "lev" which usually represents the levels. @bena-nasa, I think we should change this dimension name.

mathomp4 commented 1 year ago

@weiyuan-jiang I'll let @bena-nasa chime in, but I think one reason we did that (at least in the past) was for plotting packages like GrADS that could only handle 3rd dimensions that were layer or level. Without some post processing, the variables were unviewable.

I suppose we should do whatever CF says is "right" for these sorts of things, but I want to say these might be part of the "discrete axis" part of the Conventions which is confusing to read.

junwang-noaa commented 1 year ago

@mathomp4 @weiyuan-jiang Thanks for the information. I think we are OK to use the "lev" for 3rd dimension. Is it possible that we have attributes to specify what wavelengths the lev values represent?

@bbakernoaa @rmontuoro @lipan-noaa FYI.

bbakernoaa commented 1 year ago

@junwang-noaa I believe that is specified in the GOCART2G_GridComp.rc file

In the global-workflow it is here: https://github.com/NOAA-EMC/global-workflow/blob/develop/parm/chem/GOCART2G_GridComp.rc#L41

junwang-noaa commented 1 year ago

@bbakernoaa Thanks for the info. So do we want to have the information in the gocart files as attributes or it is OK to leave with lev value 1-4? I mean for downstream jobs (post or verification jobs)

bbakernoaa commented 1 year ago

@junwang-noaa I think it is ok. In production we may send it through the UPP anyway. We currently do not pass the AOD back to the physics radiation.

junwang-noaa commented 1 year ago

@bbakernoaa Thanks for confirming. I will ask the library team to install the MAPL 2.35.2 for UFS.

junwang-noaa commented 1 year ago

I have run some tests with threads using the MAPL 2.35.2 and GOCART develop branch e2245c2. It looks to me that the threading does not work in UFS. We may still need some work to enable threads in UFS WM:

single thread:

      [CHM] RunPhase1                                                      144    144    120      67.9798     55.2096     142     91.1209     39
        [CAP] Run 1                                                        144    144    120      66.5963     53.6441     142     89.8835     39
          [EXTDATA] Run 1                                                  144    144    120      39.2418     26.1711     142     64.7887     39
            [EXTDATA] Run 11                                               144    144    120      39.2375     26.1678     142     64.7829     39
          [AERO] Run 1                                                     144    144    120      25.3808     23.0391     20      28.2873     54
            [AERO] Run 11                                                  144    144    120      25.3781     23.0370     20      28.2846     54
                [GOCART2G] Run 12                                          144    144    120      24.0121     21.6550     133     26.9398     54

4 threads:

      [CHM] RunPhase1                                                      36     144    120      131.4783    124.9583    12      139.0562    36
        [CAP] Run 1                                                        36     144    120      129.3455    123.0336    12      136.7338    36
          [AERO] Run 1                                                     36     144    120      88.4206     81.5668     36      95.0078     88
            [AERO] Run 11                                                  36     144    120      88.4186     81.5650     36      95.0057     88
              [GOCART2G] Run 2                                             36     144    120      83.9012     77.0662     36      90.4515     88
...
          [EXTDATA] Run 1                                                  36     144    120      40.1352     33.6841     128     54.1526     36
            [EXTDATA] Run 11                                               36     144    120      40.1316     33.6808     128     54.1485     36
mathomp4 commented 1 year ago

@junwang-noaa I think @weiyuan-jiang and @aoloso are looking at this...

junwang-noaa commented 1 year ago

@weiyuan-jiang @aoloso I have two test cases ready on Orion for you to run some tests.

Case 1: single thread (total PETs 192 for atm forecast and chem, layout 4x8, 1 thread)

/work/noaa/stmp/junwang/stmp/junwang/FV3_RT/rt_72194/atmaero_control_p8

Timing profile is in ESMF_Profile.summary

        [fv3_fcst] RunPhase1               192    192    120      130.4369    113.6451    52      143.5964    107
              [GOCART2G] Run 2             192    192    120      22.0089     17.5834     56      28.3196     72
              [GOCART2G] Run 1             192    192    120      1.0615      1.0312      186     1.0915      1

Case 2: 4 threads (total PETs 192 for atm forecast and chem, layout 1x8, 4 threads)

/work/noaa/stmp/junwang/stmp/junwang/FV3_RT/rt_72194/atmaero_control_p8_thrd4

Timing profile is in ESMF_Profile.summary

        [fv3_fcst] RunPhase1               48     192    120      140.5584    131.2598    52      146.3891    12
              [GOCART2G] Run 2             48     192    120      84.5028     73.9572     20      93.3594     68
              [GOCART2G] Run 1             48     192    120      3.6855      3.5674      168     3.7565      0

My branch is: https://github.com/junwang-noaa/ufs-weather-model/blob/newmapl the code is on Orion at:

/work/noaa/nems/junwang/ufs-weather/20230306/mapl/ufs-weather-model

You can compile the code by:

cd ufs-weather-model/tests ./compile.sh orion.intel "-DAPP=ATMAERO -DCCPP_SUITES=FV3_GFS_v17_p8 -D32BIT=ON" 001 you will get an executable fv3_001.exe, you can copy it over to fv3.exe the run directory and in the run directory, just submit: sbatch job_card

Please let me know if you have any questions, Thanks.