E3SM-Project / e3sm_diags

E3SM Diagnostics package
https://e3sm-project.github.io/e3sm_diags
BSD 3-Clause "New" or "Revised" License
39 stars 32 forks source link

CDAT Migration Phase 2: Refactor core utilities and `lat_lon` set #677

Closed tomvothecoder closed 1 year ago

tomvothecoder commented 1 year ago

Description

This PR refactors most core utilities for e3sm_diags to use Xarray/xCDAT since CDAT is deprecated at the end of the year. The lat_lon set is also refactored to use these core utilities.

How to Review

This is a large PR that involves many changes. I am interested in code review for areas of improvement, optimization, potential bugs, etc. The functionality itself is unchanged and working (refer to viewer outputs at the bottom of the description).

To help streamline the review process, there is section below for the main focus areas. If you are interested and have time, feel free to review other files and functions that are not listed.

I anticipate review will take the next few weeks (targeting end of Sep/2023). I'll try my best to respond to any comments, suggestions, or questions you might have. Thanks for the help!

Main Focus Areas

Each focus area includes the most important files and functions to review.

(1) Dataset class

(2) Computation, Regridding, and I/O Utilities

(3) Lat Lon Driver

(4) Lat Lon Plotter

(5) Derived Variables

(6) Misc.

Checklist

If applicable:

Viewer Outputs

Future Items

tomvothecoder commented 1 year ago

Hi @chengzhuzhang, I'm spending more time managing the effort and progress for this PR.

Here are a few notes:

  1. This is a large and high priority task so we might need to pull another person in to help refactor other components if we want it merged within the next 1-2 months. xCDAT and other projects are currently taking up a lot of my bandwidth.
    • Workaround -- I refactor the core driver and utilities, and somebody else refactors the plotter and viewer
  2. I'm sinking too much time on investigating the diffs between xCDAT's climatology vs. E3SM diags climatology
    • Workaround -- Refactor E3SM diags climatology to operate on xr.DataArray instead of TransientVariable. Open up a new issue to integrate xCDAT's climatology later, and attach my investigation Jupyter Notebook
  3. I don't have a clear timeline on when Jason can work on incorporating a version of cdutil.reconstructPressureFromHybrid into cf-xarray.
    • Temporary Workaround -- Create a function in e3sm_diags that is a copy of this code and convert it to xarray directly
chengzhuzhang commented 1 year ago

Hi @chengzhuzhang, I'm spending more time managing the effort and progress for this PR.

Here are a few notes:

1. This is a large and high priority task so we might need to pull another person in to help refactor other components if we want it merged within the next 1-2 months. xCDAT and other projects are currently taking up a lot of my bandwidth.

   * **Workaround -- I refactor the core driver and utilities, and somebody else refactors the plotter and viewer**

2. I'm sinking too much time on investigating the diffs between xCDAT's climatology vs. E3SM diags climatology

   * **Workaround -- Refactor E3SM diags climatology to operate on xr.DataArray instead of TransientVariable. Open up a new issue to integrate xCDAT's climatology later, and attach my investigation Jupyter Notebook**

3. I don't have a clear timeline on when Jason can work on incorporating a version of [cdutil.reconstructPressureFromHybrid](https://github.com/CDAT/cdutil/blob/b823b69db46bb76536db7d435e72075fc3975c65/cdutil/vertical.py#L8) into `cf-xarray`.

   * **Temporary Workaround -- Create a function in e3sm_diags that is a copy of this code and convert it to xarray directly**

Hey @tomvothecoder thank you for the thoughtful suggestions. This is a high priority PR, since we have a crunching time line for the CDAT support. The workaround for climatology and and pressure level construction sounds good to me! Both @forsyth2 and I are familiar with the plotter and viewer code, so I think distribute the refactor work to either would make sense. We can talk about how to get start splitting the work.

tomvothecoder commented 1 year ago

Hey @chengzhuzhang and @forsyth2, this PR is ready for code review. The PR description includes all of the information needed to help make the review process smooth since it's a large PR.

I have a few items left including validating the metrics for closeness compared to main, fixing the unit tests, and making sure integration tests pass. After everything checks out, I will merge this PR into a cdat-refactor-fy23 development branch, which we will merge into main periodically until all sets are refactored.

chengzhuzhang commented 1 year ago

Hey @chengzhuzhang and @forsyth2, this PR is ready for code review. The PR description includes all of the information needed to help make the review process smooth since it's a large PR.

I have a few items left including validating the metrics for closeness compared to main, fixing the unit tests, and making sure integration tests pass. After everything checks out, I will merge this PR into a cdat-refactor-fy23 development branch, which we will merge into main periodically until all sets are refactored.

Super! Thank you for the notes. I will start review soon!