GEOS-ESM / MAPL

MAPL is a foundation layer of the GEOS architecture, whose original purpose is to supplement the Earth System Modeling Framework (ESMF)
https://geos-esm.github.io/MAPL/
Apache License 2.0
26 stars 18 forks source link

Change in bit shave behavior? #2655

Open tclune opened 3 months ago

tclune commented 3 months ago

The modeling team is seeing some odd behavior in some diagnostics that appears to be due to a change in bitshave behavior. I'm not recalling any changes there except a couple(?) years back when we identified that the MAPL implementation was not layout reproducible. But even then we don't expect significantly different results from that change. (I think)

@wmputman , @narnold1 , and @sdrabenh Should be updated if there is an update. (And they should update this ticket with additional details - esp. MAPL versions ...)

atrayano commented 3 months ago

If I remember correctly, there was a change in the bit-shaving to produce more proper roundoff behavior. Hard to believe that this would have measurable impact. Could anybody provide more details (and pictures that illustrate the odd behavior)

narnold1 commented 3 months ago

This is surface pressure, PS, output with (black) and without (red) bit shaving and deflation, based on collection int_inst_1hr_glo_C90x90x6_slv in the R21C HISTORY.rc (https://github.com/GEOS-ESM/GEOSgcm_App/blob/c0ecf07a1b8a30e985f0e3837b0eb7124834690e/HISTORY_R21C.rc.tmpl). This was run with the GEOSadas "R21C" branch. MAPL branch is also "R21C."

image

mathomp4 commented 3 months ago

The last nbits change I can see is #1947 which was a fix for #1941. Per @bena-nasa:

This is obviously non-zero diff for History diagnostic output when using the nbits option in History unless your application is only being run on a single MPI processes but also obviously has no effect on the module checkpoints and MODEL non-zero diff.

This change came in with MAPL 2.35.

A while back I added support for netCDF quantize support but that does not key off of nbits. It has more complex quantization support including a less-biased "bit shaving" as well as other algorithms.

mathomp4 commented 3 months ago

NOTE: MAPL branch R21C is "essentially" MAPL 2.35.4 (was never tagged).

bena-nasa commented 3 months ago

What are people saying is wrong? I'm confused.

I fixed bug in version v2.35 because the bit shaving was being done in a non-layout reproducible way and was just wrong before. The bitshaving algorithm is something Arlindo/Max/someone else? coded up, eons ago. It takes into account the mean of the array being bit shaved. Before v2.35 I was passing in the local pointers, so this mean was the local mean for the domain, not the mean of the full field, so you would get a different answer depending on decomposition. Staring in MAPL v2.35 for each 2D slice, I compute the mean across all processors.

Note this is EXACTLY how history worked before we moved away from the old CFIO layer (v2.0.0 of MAPL I believe) since we were passing full 2D slices to the routine that did the bit shaving.

So from CVS to v2.0.0 of MAPL, the bit shaving was unchanged, from v2.0.0 to v2.35 the bit shaving was not quite being done right and was not layout reproducible. This was then fixed v2.35 and became identical to what was being done in the CVS and pre v2.0.0 MAPL days.

A bit-shaved vs non-bit shaved output are different. If you make a plot where you compare these 2, they won't be the same. Is it something about the pattern in that plot?

atrayano commented 3 months ago

If we use the typical value nbits=12, that would translate into a difference of at most 14 Pascals for sea level pressure (bit-shaved vs not-shaved). This is in agreement with Nathan's plot

mathomp4 commented 3 months ago

If we use the typical value nbits=12, that would translate into a difference of at most 14 Pascals for sea level pressure (bit-shaved vs not-shaved). This is in agreement with Nathan's plot

R21C is using nbits: 10 it looks like. I tried running this with current GEOS at C360 and I'm only seeing at most 4 Pa difference, but I only ran for a day and probably using different physics, etc.

narnold1 commented 3 months ago

I think the issue is not a change or problem in the bit-shaving itself, but simply that the bit-shaving is too aggressive in the R21C HISTORY to close the various budgets. A 4 Pa difference is still too coarse for some budget calculations. We'll just have to reduce or remove the shaving for some collections.

atrayano commented 3 months ago

I agree with Nathan. The bit-shaving is a technique to minimize the file size, but it might impact negatively any post-run budget calculations. My suggestion would be to run without bit-shaving