GEOS-ESM / MAPL

MAPL is a foundation layer of the GEOS architecture, whose original purpose is to supplement the Earth System Modeling Framework (ESMF)
https://geos-esm.github.io/MAPL/
Apache License 2.0
26 stars 18 forks source link

Issue using REGRID_METHOD_CONSERVE_HFLUX reading c180 GEOS-IT data #2118

Open lizziel opened 1 year ago

lizziel commented 1 year ago

I am using MAPL 2.26.0 to run GCHP at C24 with GEOS-IT meteorology. I am encountering an error in MAPL that occurs on a C180 input file during MAPL_ExtDataPrefetch. MAPL is searching for a prototype to make a new regridder for the file and is not able to find one.

pe=00009 FAIL at line=00147    NewRegridderManager.F90                  <no such property>
pe=00009 FAIL at line=00092    NewRegridderManager.F90                  <status=1>
pe=00009 FAIL at line=01011    GriddedIO.F90                            <status=1>
pe=00009 FAIL at line=04705    ExtDataGridCompMod.F90                   <status=1>
pe=00009 FAIL at line=01490    ExtDataGridCompMod.F90                   <status=1>
pe=00009 FAIL at line=01807    MAPL_Generic.F90                         <status=1>
pe=00009 FAIL at line=01337    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=01300    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=01260    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=00837    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=00977    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=00301    MAPL_Cap.F90                             <status=1>
pe=00009 FAIL at line=00258    MAPL_Cap.F90                             <status=1>
pe=00009 FAIL at line=00192    MAPL_Cap.F90                             <status=1>
pe=00009 FAIL at line=00169    MAPL_Cap.F90                             <status=1>
pe=00009 FAIL at line=00031    GCHPctm.F90                              <status=1>

The file is a GEOS-IT C180 file: /home/dao_ops/d5294_geosit_jan18/run/.../archive/diag/Y2019/M07/d5294_geosit_jan18.ctm_tavg_1hr_glo_C180x180x6_v72.2019-07-01T0030Z.nc4

The regrid method is 'H', which corresponds to REGRID_METHOD_CONSERVE_HFLUX.

I am using ExtData and not ExtData2G.

Any ideas on what the problem is? I am digging through the traceback now but welcome any thoughts.

mathomp4 commented 1 year ago

I've assigned @bena-nasa because he is the expert here!

bena-nasa commented 1 year ago

@lizziel I'm not sure what's going on. I'll try to reproduce with my standalone tester for ExtData/History.

lizziel commented 1 year ago

Thanks @bena-nasa. My ExtData.rc entry is this:

MFXC;MFYC Pa_m+2_s-1    N H F0;003000 none  0.6666666 MFXC;MFYC  ./MetDir/Y%y4/M%m2/d5294_geosit_jan18.ctm_tavg_1hr_glo_C180x180x6_v72.%y4-%m2-%d2T%h2%n2Z.nc4 2017-01-01T00:30:00P01:00
CXC;CYC   1             N H F0;003000 none  none      CX;CY      ./MetDir/Y%y4/M%m2/d5294_geosit_jan18.ctm_tavg_1hr_glo_C180x180x6_v72.%y4-%m2-%d2T%h2%n2Z.nc4 2017-01-01T00:30:00P01:00
bena-nasa commented 1 year ago

@lizziel I just pulled and built v2.26.0 of MAPL, and ingested those same GESO-IT files via this ExtData.rc file

Ext_AllowExtrap: .true.
Prefetch: .true.

PrimaryExports%%
MFXC;MFYC NA N H F0;003000 none none MFXC;MFYC d5294_geosit_jan18.ctm_tavg_1hr_glo_C180x180x6_v72.%y4-%m2-%d2T%h2%n2Z.nc4 2017-01-01T00:30:00P01:00
%%

If I run my driver at something that is exactly divisible by 180, like c90 it works, but if I run at c24, c48, etc it fails. I think the problem is that c24, so 24 is not divisible by 180 and it can't generate the flux regridder for that case.

@tclune wrote all this so he might have something more to say. Probably could use a more descriptive error.

lizziel commented 1 year ago

Aha! Now that you mention it I recall @LiamBindle mentioning this limitation when he ran using GEOS-FP mass fluxes, or maybe I am confusing that with stretched grid limitations. Regardless, thanks!

@tclune, this isn't a huge problem for us, but is good to know. Maybe it should be added as a comment somewhere?

lizziel commented 1 year ago

And maybe the error handling could be expanded to give a message about why it is failing.

tclune commented 1 year ago
  1. The HorizontalFluxRegridder is known to be incorrect at this time. There is a branch awaiting testing by Seb (been there for a while now.)

  2. We can add a somewhat better error message, but it will still be rather vague, as there is no way to blame any particular regridder for failing to work. All that we can do is state more clearly with:

    _FAIL('No regridder prototypes support the requested spec')

    (Should go just before the _RETURN(...) which will then be redundant and could be removed.

  3. The Flux regridder should have satisfied the use case if I understand correctly. The only requirement is that the output grid be coarser by an integer factor: https://github.com/GEOS-ESM/MAPL/blob/a78c1ce521f94a33c77df67da37cd33db1b4c895/base/HorizontalFluxRegridder.F90#L56

lizziel commented 1 year ago

@sdeastham, is the branch that needs testing on your radar? I wonder if I should stick to winds until the flux regridder issues are sorted.

I'll update the GCHP docs to warn users about the limitation and what error message (or traceback) to look out for.

It failed for my use cases because ran at c24 and c48, both not coarser by an integer factor of c180. Those resolutions are fine with GEOS-FP mass fluxes which are c720, but GEOS-IT is more limiting at c180.

sdeastham commented 1 year ago

@lizziel - it's on my radar. I wouldn't wait, to be honest; the tests I've been performing have so far been with the "buggier" flux regridder, but it's still a vast improvement over using winds (errors in absolute surface pressure change at each time step fall by a factor of five at C30).

tclune commented 1 year ago

@lizziel The branch in question does not change the divisibility requirement. Seb and I were unable to come up with a generalization, and you're probably better off using an ordinary regridding method for non-divisible cases.

lizziel commented 1 year ago

Okay, sounds good. Strangely I am still getting the same error in a c90 run. I briefly am switching to winds just so I can diagnose and fix the other ExtData data issues (missing files on discover, etc) and then will swing back to getting mass flux regridding working.

lizziel commented 1 year ago

Regridding mass fluxes now works with the following two adjustments: (1) switching from C24 to C90, and (2) switching from 96 processors to 216 processors.

It would be great if you could incorporate the grid resolution and processor constraints into the error handling somehow, even if just an expanded message that points people to comments in the code detailing what the constraints are. That message could be triggered only if the regrid type is 9 (which is HFLUX regridding).

@tclune

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

mathomp4 commented 1 year ago

I'll long term this until @tclune can look at the last message from @lizziel

lizziel commented 1 year ago

I think this issue can be closed by https://github.com/GEOS-ESM/MAPL/pull/2056 once it is merged. @tclune mentioned in the PR that he added logic to do a better job giving a helpful message for this issue. However, I don't see the update for it there so I am not sure if it is pushed yet.

tclune commented 1 year ago

@tclune mentioned in the PR that he added logic to do a better job giving a helpful message for this issue. However, I don't see the update for it there so I am not sure if it is pushed yet.

I remember saying that. But don't remember doing it. If I said it in the past tense, then presumably I did ...

tclune commented 1 year ago

This and the previous branch have this:

https://github.com/GEOS-ESM/MAPL/blob/7dc923462e76a4f595f46286ab4b27ba5e6c3588/base/MAPL_RegridderManager.F90#L159-L160

It would be better as an _ASSERT but appears to capture the essence of what I was saying. So the question becomes, what error message were you seeing @lizziel

tclune commented 1 year ago

I wonder if I put the fix in the wrong branch. "2nd try" branch has this:

https://github.com/GEOS-ESM/MAPL/blob/144f46f199a8fa6028b08587b8cee0447de0469c/base/HorizontalFluxRegridder.F90#L62-L63

While 3rd try just returns at that point. I'll copy the line over.

lizziel commented 1 year ago

Perfect, that's what I was looking for!

mathomp4 commented 1 year ago

@lizziel I'm hoping to test #2056 tomorrow. It should be (trivally) zero-diff since I don't even know how to trigger this :)