LiamBindle commented 3 years ago

Hi everyone, this is a discussion/documentation thread related to running GCHP (and more broadly GEOS-Chem) with native MERRA2 metfields. Currently, I'm working on getting GCHP running with native MERRA2 metfields as a part of AIST:

so we can run GEOS-Chem with native metfields, and so at some point in the future we could eliminate the metfields preprocessing step
so we can evaluate the cubed-sphere metfield archive (since the preprocessing step only works for lat-lon inputs)

The first section (below) is documentation of the things I learned from setting GCHP up to run off the MERRA2 C180 archive. The second section ("Questions") is a condensed description of some open questions I have.

Documentation

This section contains documentation of what I learned from (1) setting GCHP up to run with the MERRA2 C180 archive, and (2) structural/definition differences I found between our current MERRA2 metfields and this new MERRA2 C180 archive. This section is partly a sticky for documenting this info, because it might be useful in the future/to others.

Here are the relavant parts of my ExtData.rc for running off this C180 MERRA2 archive (note: ./CSMetDir/ is a directory with metfields that I flipped vertically offline): ExtData.txt

Required inputs

We use the following variables from the following MERRA2 collections (these are for Lizzie's C180 MERRA2 run):

inst1_3d_asm_CSv (granule size: 115M)
- PS, QV, T (subgranule size: 43M)
tavg1_3d_asm_CSv (granule size: 587M)
- CLOUD, OMEGA, QI, QL, RH, U, V (subgranule size: 129M)
- Note: We want U and V from tavg1_3d_asm_CSv, not tavg1_3d_ctm_CSv (it looks like it has different basis vectors)
tavg1_2d_rad_CSx (granule size: 13M)
- ALBEDO, CLDTOT, LWGNT, SWGDN (subgranule size: 2M)
tavg1_2d_flx_CSx (granule size: 20M)
- EFLUX, EVAP, FRSEAICE, HFLUX, PBLH, PRECANV, PRECCON, PRECLSC, PRECSNO, PRECTOT, USTAR, Z0M (subgranule size: 4.7M)
tavg1_2d_lnd_CSx (granule size: 15M)
- FRSNO, GRN, GWETROOT, GWETTOP, LAI, PARDF, PARDR, SNODP, SNOMAS (subgranule size: 2.7M)
tavg1_2d_chm_CSx (granule size: 7.8M)
- LWI (subgranule size: 0.9M)
tavg1_2d_slv_CSx (granule size: 18M)
- SLP, TO3, TROPPT, TS, T2M, U10M, V10M (subgranule size: 3.1M)
tavg1_3d_cld_CSv (granule size: 543M)
- DTRAIN, TAUCLI, TAUCLW (subgranule size: 8.8M)
tavg1_3d_mst_CSv (granule size: 329M)
- DQRCU, DQRLSAN, REEVAPCN, REEVAPLSAN (subgranule size: 23M)
tavg1_3d_mst_CSe (granule size: 333M)
- CMFMC, PFICU, PFILSAN, PFLCU, PFLLSAN (subgranule size: 20M)

The total size of the raw granules is ~2 GB/hour (or ~17.5 TB/year). The total size of subsetted and compressed granules is 0.24 GB/hour (or ~2 TB/year). These are both for MERRA2 at C180.

Problems

tavg1_2d_rad_CSx:ALBEDO

The native field is 1-hour time average surface albedo; at night ALBEDO is _FillValue. We preprocess to a 24-hour time average, which only includes daytime values; we assign albedo 0.85 for polar night.

Below is an a comparison of StateMet%ALBD for January 2, 2017.

Who it affects

Standard simulation

Albedo is used in GeosCore/calc_met_mod.F90 to determine the surface type, specifically the IsLand, IsWater, IsIce and IsSnow flags for a grid-box. Note that the LWI metfield is a flag for land (1), water (0), and ice (2).

       ! Land: LWI=1 and ALBEDO less than 69.5%
       State_Met%IsLand(I,J) = ( NINT( State_Met%LWI(I,J) ) == 1 .and. &
                               State_Met%ALBD(I,J)  <  0.695e+0_fp )

       ! Water: LWI=0 and ALBEDO less than 69.5%
       State_Met%IsWater(I,J) = ( NINT( State_Met%LWI(I,J) ) == 0 .and. &
                                State_Met%ALBD(I,J)  <  0.695e+0_fp )

       ! Ice: LWI=2 or ALBEDO > 69.5%
       State_Met%IsIce(I,J) = ( NINT( State_Met%LWI(I,J) ) == 2 .or. &
                              State_Met%ALBD(I,J)  >= 0.695e+0_fp )

       ! Snow covered: ALBEDO > 40%
       State_Met%IsSnow(I,J) = ( State_Met%ALBD(I,J)  > 0.40e+0_fp )

If State_Met%ALBD is NaN (night-time), these comparisons evaluate to false. Related: http://acmg.seas.harvard.edu/geos/wiki_docs/merra/MERRA_LWI_flags.pdf

Albedo is also used in HEMCO as a proxy for land type. It is reference:

in HEMCO/src/Extensions/hcox_soilnox_mod.F90 to check if the surface is snow or ice
in HEMCO/src/Extensions/hcox_gc_POPs_mod.F90 to check if surface is land or ice
in HEMCO/src/Extensions/hcox_seaflux_mod.F90 to check if surface is land or ice
in HEMCO/src/Core/hco_geotools_mod.F90 geotools to determine LWI. This land type subroutine is used in
- hcox_iodine_mod.F90, hcox_dustdead_mod.F, hcox_lightnox_mod.F90, hcox_tomas_jeagle_mod.F90, hcox_seaflux_mod.F90, hcox_seasalt_mod.F90, hcox_custom_mod.F90

Specialty simulations

Albedo is used in RRTMG and APM simulations.

Proposed Solution

Proposed solution 1

Update GeosCore/calc_met_mod.F90 so it determines IsLand, IsWater, IsIce and IsSnow based on fractional area:

       real :: FRLAND_NOSNO_NOICE, FRWATER, FRICE, FRSNO

       ! Land without snow or land ice
       FRLAND_NOSNO_NOICE = State_Met%FRLAND(I,J) - State_Met%FRSNO(I,J) - State_Met%FRLANDIC(I,J)

       ! Water without sea ice
       FRWATER = State_Met%FRLAKE(I,J) + State_Met%FROCEAN(I,J) - State_Met%FRSEAICE(I,J)

       ! Land and sea ice 
       FRICE = State_Met%FRLANDIC(I,J) + State_Met%FRSEAICE(I,J)

       ! Land snow
       FRSNO = State_Met%FRSNO(I,J)

       ! Set IsLand, IsWater, IsIce, IsSnow based on max fractional area
       State_Met%IsLand(I,J)  = (FRLAND_NOSNO_NOICE > MAX(FRWATER, FRICE, FRSNO))
       State_Met%IsWater(I,J) = (FRWATER > MAX(FRLAND_NOSNO_NOICE, FRICE, FRSNO))
       State_Met%IsIce(I,J)   = (FRICE > MAX(FRLAND_NOSNO_NOICE, FRWATER, FRSNO))
       State_Met%IsSnow(I,J)  = (FRSNO > MAX(FRLAND_NOSNO_NOICE, FRWATER, FRICE))

Update HEMCO to accept explicit LWIS flags
Consult RRTMG and APM groups about handling undefined albedos

Note: Technically, a more accurate approach would be using derived exports for surface areas (FRXXXX*AREAM2) since the grid-boxes are quasiuniform. Using FRXXXX will slightly favor the grid-boxes towards the edges and corners of the face. The difference in weighting, however, it probably quite small.

Proposed solution 2

We set albedo to its previous value if it's undefined.

tavg1_2d_lnd_CSx:PARDF,PARDR

These fields are the downwelling photosynthetically active radiation (PAR) diffuse (DF) and direct (DR) flux at the surface. Previously we used the "PARDFLAND" and "PARDRLAND" fields from from the tavg1_2d_lnd collection; the oceans are masked in these fields. The current metfields that I'm looking at have "PARDF" and "PARDR" which do not mask the oceans.

Below is a comparison of PARDF and PARDR for January 2, 2017. Note that these are different versions of MERRA2. The point of these comparisons is to identify structural/definition differences in the fields (like masking).

PARDF:

PARDR:

The difference worth noting is the current GC metfields mask PARD[RF] over the oceans, whereas the new native metfields don't.

Who it affects

MEGAN emission (negligible)

These fields are used for biogenic emissions, specifically, MEGAN. Here is the 1-day benchmark results for InvMEGAN_*: MEGAN_emis_native_par.pdf

It looks like MEGAN emissions are explicitly masked for land-only. Therefore, the differences from this incoming change to PARD[RF] look like they're negligible.

tavg1_2d_lnd_CSx:SNOMAS

In old versions of GEOS, SNOMAS for ice-covered areas was arbitrarily set to 4 m. The current GEOS-Chem metfield preprocessing step converts the native SNOMAS values to the older definition (it sets it to 4 m for ice-covered areas).

Who it affect (no one?)

I don't see SNOMAS used anywhere in GEOS-Chem or HEMCO. Therefore, I don't think this affects anyone.

Notes

Our current metfield preprocessing has a step where it fills missing TROPPT values with the zonal average prior to regridding. There were no missing values in any of the 24 tavg1_2d_slv_CSx granules for 2017-01-02. Therefore, I suspect we can simply ingest TROPPT directly without preprocessing.

Questions

[x] Could someone review my description of the incoming PARDR and PARDF changes and confirm that the effects on MEGAN should be negligible?
[x] What do you think is the best approach for handling the native ALBEDO (which is _FillValue at nighttime and for polar night)?
- A) [SELECTED] In GEOS-Chem, we could determine land (L), water (W), ice (I), and snow (S) flags based on fractional surface areas (or native LWI flags). In HEMCO, we would need to add the ability to explicitly specify LWIS flags. In HEMCO, we would also need to update a few extensions that use ALBEDO as a proxy for LWIS (they can use HEMCO LWIS flags instead).
- B) (Kludgy) Only update ALBEDO if it's defined. We would have to initialize all the grid-boxes to some pre-defined values.
- C) Do you have another idea?

sdeastham commented 3 years ago

Great analysis! My two cents:

The use of albedo as a proxy for ice cover is an old kludge, and one we should kill. You may have notice that it gets worse, because it's actually used inconsistently; in some HEMCO extensions we use one threshold (0.4), which is really way too low - that sometimes identifies large chunks of the Sahara as being "ice"! Elsewhere we're using 0.695, which is better but still dumb. Any move we can make towards using the actual reported ice cover is a big improvement in my eyes. I would suggest that we take route A).
On PARDR and PARDF, I'd much rather use the unmasked values - I'm very worried that the regridded, masked values are going to be causing some issues on the coasts, so I don't really get why we'd use that. However there must be some reason we started doing this - maybe @yantosca can comment?

yantosca commented 3 years ago

Thanks @sdeastham and @LiamBindle for looking into this. My thoughts:

I agree, the use of albedo for ice cover is a kludge. It probably was used to implement a "tie-breaker" because some of the earlier met fields (GEOS-3, GEOS-4) had different land/water/ice flags. But we know that MERRA-2 and GEOS-FP now use the same convention for land/water/ice, so we should get rid of the ALBEDO dependency.
I wasn't aware that PARDF, PARDR were actually PARDFLAND, PARDRLAND. This might have been a change in the GMAO file specification document. Agreed, we should use PARDF, PARDR over land and oceans, since there will be photosynthesis in the oceans that generate biogenic emissions.

LiamBindle commented 3 years ago

Thanks @yantosca and @sdeastham for your thoughts! Since you both agree the PARDR, PARDF update is desireable, and that we can stop using ALBEDO as a proxy for surface type, I'll go ahead and mark both questions as resolved.

Thanks again for your input!

LiamBindle commented 3 years ago

Just a quick update. There are in fact some missing values in TROPPT. They are pretty infrequent, but they are there every once in a while, and they need to be filled (GCHP crashes otherwise).

sdeastham commented 3 years ago

Dang. Ben Auer may have some ideas on a way to fill NaNs before regridding (using a "typical" value would probably be OK, if these are very infrequent)?

yantosca commented 3 years ago

Or we could use one of the other tropopause fields (based on momentum). Not sure how much that would change the base run though.

LiamBindle commented 3 years ago

It actually crashes here believe it or not. Any idea why that might be? It also only crashs at the native resolution (if it's regridded to say C24 online it works OK).

sdeastham commented 3 years ago

I'd be worried about what the regridding is doing though - if it's treating the NaN as some fill value, it would be nice to know what that fill value is. I'm guessing that if it's not regridded then MAPL is just passing a NaN indicator through which is upsetting the code at the point @LiamBindle indicated, but that if it IS regridded then the value becomes "good" after the NaN is treated through some unknown process (ideally ignored, but I doubt it)

LiamBindle commented 3 years ago

Good point—I'm not sure. I'll ask about that in our MAPL telecon next Thursday. Anyways, I thought this was worth noting for when we move to native/minimally-pre-processed metfields in the future.

yantosca commented 3 years ago

@LiamBindle if it crashes at that point, it could be an e.g. out of bounds error in the pointer field. or something like that.

LiamBindle commented 3 years ago

@yantosca It says it's a floating point exception there. Do you why if/why that could trigger an FPE?

yantosca commented 3 years ago

It could be that the pointer is still pointing to NULL() (and thus has a denormal value). Or the pointer hasn't been allocated.

LiamBindle commented 3 years ago

Re: https://github.com/GEOS-ESM/MAPL/issues/284#issuecomment-784353502 (here to avoid a tangent in the MAPL discussion)

I think we should work towards eliminating metifeld preprocessing as it is now. The programs haven't seen much love in a few years (there are incompatibilities with modern compilers), and most of the preprocessing is trivial/redundant with modern GC-Classic and GCHP capabilities.

In the near-term, the preprocessing could be rewritten as NCO and CDO scripts. This would be maintainable and pretty clean. Rewriting as NCO/CDO scripts should be pretty easy, and it can be started any time (this thread demonstrates our preprocessing programs aren't doing anything we couldn't do with NCO/CDO).

Direct ingest of fields from GEOS would be a great capability, but some level of preprocessing is likely still required at least for the official metfields (subsetting at least).

sdeastham commented 3 years ago

Agreed that the current processors should be eliminated. My goal is to at least make it possible to ingest the met fields directly - or at least remove as many barriers as possible. Vertical flipping strikes me as something that falls into the "avoidable oversight" capacity such that I'd like to fix it (much like finding a reliable fix for albedo and tropopause issues), but I agree that the subsetting issue is likely to make many (most?) users still rely on some level of preprocessing.

One caveat: I'm not sure what the long term status is of NCO and CDO in terms of support or development - it would be good to check that with someone at NCAR. Question would be whether we use NCO/CDO or Python 3.

LiamBindle commented 3 years ago

Agreed.

I'd suggest NCO and CDO though because it restricts the sciprts to well-established and "standard" operations—it would be more transparent. NCO and CDO are also compliant with common conventions like appending operations to the "history" attribute, and it would be ideal if we could avoid deviating from conventions unless it's warranted.

I'm a big Python advocate in general (personally I like Python more than NCO/CDO), but it's easy to do wonky custom things in Python, and compliance with conventions isn't automatic.

lizziel commented 3 years ago

Hi all, I'm late to the party on this issue. All looks good, but I'm wondering about the cloud fraction calculation that I think is still part of the pre-processing. I'm referring to this set of equations. I think the solution we talked about was to do the calculation at run-time, prior to regridding the inputs needed for the calculation. Is this still on the radar?

yantosca commented 3 months ago

Closing out this issue

geoschem / geos-chem

[DISCUSSION] Running GEOS-Chem with native C180 MERRA2 data #1415

Documentation

Required inputs

Problems

tavg1_2d_rad_CSx:ALBEDO

Who it affects

Standard simulation

Specialty simulations

Proposed Solution

Proposed solution 1

Proposed solution 2

tavg1_2d_lnd_CSx:PARDF,PARDR

Who it affects

MEGAN emission (negligible)

tavg1_2d_lnd_CSx:SNOMAS

Who it affect (no one?)

Notes

Questions