Closed tomvothecoder closed 1 year ago
Moved the section below out of the description to make the description shorter.
I've noticed many Python anti-patterns and "not good" (bad) coding practices used in this repository. This has resulted in a lot of technical debt for new developers of this repo.
handle_variables()
/handle_simple()
/default_handler()
**kwargs
to pass around function arguments (example) -- really fragile and hard to debugHi @chengzhuzhang, this PR is finally ready for review! I performed thorough regression testing and everything checks out.
The PR description includes the core changes that you might want to focus on. Thanks!
Thanks @chengzhuzhang! I will merge this now.
Overview
This PR was originally supposed only address #115, but I found the codebase difficult to work with so I decided to refactor it.
Regression Testing
TLDR: All of the datasets produced by this branch align with the
master
branch. The max relative differences between 5/109 datasets are insignificant and can be attributed to floating point rounding error produced by Xarray vs. CDAT, which is fine.Setup:
scripts/branch-regression-testing/115-cdat-refactor-test/115-end-to-end-script.sh
to compare againstmaster
branch datasets.scripts/branch-regression/115-comparison-notebook.ipynb
Results:
rtol=1e-7
(cl
clw
,cli
,pfull
)1/109 is not close (
mrso
)mrso
is fine (max relative difference of 6.726486e-07)Not equal to tolerance rtol=1e-07, atol=1e-07
Mismatched elements: 25556 / 777600 (3.29%) Max absolute difference: 0.00073242 Max relative difference: 6.726486e-07 x: array([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan],... y: array([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan],...
Summary of Changes
Core Changes
lib.py
(now deleted)run_parallel()
andrun_serial()
with__main__.py
->E3SMtoCMIP._run_parallel()
andE3SMtoCMIP._run_serial()
filepaths
tovars_to_filepaths
for clarity (its a dict mapping var key to list of string paths)try
andexcept
statement for submittingpool
jobs to maintain compatibility with MPAS variable handlers, which use different handler method argumentshandle_variables()
,get_dimension_data()
andload_axis()
withVarHandler.cmorize()
handle_simple()
-- will be re-implemented from scratch in #130handler.py
VarHandler.cmorize()
data
dictionary storingxr.DataArrays
withds
(xr.Dataset
object)pfull
,phalf
VarHandler._cmor_write()
cmor.write()
instead of looping over each time value index and CMORizing each slice -- this should improve performance and removes thetqdm
progress bar.handlers.yaml
phalf
andpfull
entriesphalf.py
andpfull.py
clcalispo
entryclcalipso.py
_formulas.py
np.ndarray
toxr.DataArray
pfull
andphalf
convert_units()
function, which handles 1-to-1 unit conversions -- replacesdefault.default_handlers.write_data()
default.py
(now deleted)VarHandler.cmorize()
and_formulas.py.convert_units()
Clean Up Changes
cdms2
andcdutil
from dependencies indev.yml
andci.yml
7clisccp.py
Makefile
for easy access to commonly used commands (e.g., building and installing package)