E3SM-Project / e3sm_diags

E3SM Diagnostics package
https://e3sm-project.github.io/e3sm_diags
BSD 3-Clause "New" or "Revised" License
39 stars 32 forks source link

[Refactor]: CDAT Migration: `cdms2.open` slice flag adds additional time coordinate which affects downstream time series operations and diffs with Xarray code #759

Closed tomvothecoder closed 3 months ago

tomvothecoder commented 9 months ago

Is your feature request related to a problem?

In the CDAT version of the Dataset class, cdms2.open is being called with a slice flag (either "co"/"ccb"). This can cause result in major differences between the refactored Xarray code and the CDAT code

Related lines of code on main:

In #754, "co" is being set because time coordinates start at the beginning of the month. This adds an additional time coordinate point to the end of the time series file, which affects the subsequent climatology and spatial averaging calculations. I found omitting the extra time coordinate point before calculating the climatology and spatial averaging results in identical results to the xCDAT version.

Describe the solution you'd like

Should we consider a feature on the slice flag? I found almost no documentation on how to use this slice flag.

TODO

Describe alternatives you've considered

No response

Additional context

chengzhuzhang commented 9 months ago

Nice finding...I tried a few times to find docs on those slice options, and only found them through a xcdat discussion by @jypeter. The link to the documentation is here:https://cdms.readthedocs.io/en/latest/manual/cdms_2.html#axis-methods-additional-to-coordinateaxis-cont-d

I thought there was discussion in xcdat to consider this slicing feature, but not sure if there was a consensus.

tomvothecoder commented 9 months ago

Thanks @chengzhuzhang! I should have documented this when I ran into it while implementing the new Xarray climatology function in April. It came back to haunt me.

The good news is that I confirmed the slice flag is what is producing the large differences for the specific variables in #658 (specifically NET_FLUX_SRF and RESTOM spatial averages). Here is my validation notebook with those findings: https://github.com/E3SM-Project/e3sm_diags/blob/cdat-migration-fy24/auxiliary_tools/cdat_regression_testing/671-lat-lon/12-4-23-qa-no-cdms-slice.ipynb.

I re-ran main without the slice flag and found results are basically identical! So there is no bug that I'm aware of in the cdat-migration-fy24 branch so far.

tomvothecoder commented 9 months ago

I thought there was discussion in xcdat to consider this slicing feature, but not sure if there was a consensus.

Yeah we had a discussion but it fell off the radar. If we want to produce nearly 100% identical results with the CDAT version of e3sm_diags, we should consider revisiting this topic.

jypeter commented 9 months ago

@chengzhuzhang the document you probably want about the slices, is page 9 of Unlocking_CDATs_Best_Kept_Secrets.pdf, available through https://github.com/xCDAT/xcdat/discussions/170. The option is probably also described in cdms5.pdf, but I think the figure in the Secrets document is better

tomvothecoder commented 7 months ago

At our meeting on 1/16, we thought of three options:

  1. Implement this feature in e3sm_diags
    • Optionally, port over this feature in xCDAT in the future (if there is user desire, demand)
  2. Implement this feature in xCDAT
    • Seems like a low priority feature request and xCDAT follows its own development cycle
  3. Update code on main to remove slice flag operation
    • I did this temporarily in #754 and confirmed removing the slice flag makes results nearly identical.
    • However, it is not recommended because it will affect all past production results.

For this GitHub issue, we decided to go with 1. as a near-term solution.

tomvothecoder commented 3 months ago

Closed by #750 and #815