E3SM-Project / zppy

E3SM post-processing toolchain
BSD 3-Clause "New" or "Revised" License
6 stars 15 forks source link

Global time series plots look different after CDAT replacement #600

Closed forsyth2 closed 4 months ago

forsyth2 commented 5 months ago

Request criteria

Issue description

It appears the merging of #519 altered the appearance of the global time series plots. I ran the complete-run test on main and it showed differences for global time series.

Before merging (these plots also match the expected plots in the complete-run test): https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_test_complete_run_www/test-2.5.0/v2.LR.historical_0201/global_time_series/global_time_series_1850-1860_results/ image image

After merging: https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_test_complete_run_www/test-main-20240611/v2.LR.historical_0201/global_time_series/global_time_series_1850-1860_results/ image image

Original plots: the plots seem similar but definitely not identical. The large jump at the end of the post-519 graphs make me think #519 broke something rather than fixed something.

Land plots: There must have been some mistake in updating the expected Land plots prior to releasing 2.5.0 -- the expected plots show nothing plotted.... The post-519 plots actually have results. That said, these results are clearly incorrect, considering they're for the most part just horizontal lines.

forsyth2 commented 5 months ago

Looking at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_test_complete_run_www/zppy_pre_v2.5.0rc2/v2.LR.historical_0201/global_time_series/global_time_series_1850-1860_results/ (the results from before releasing v2.5.0, e.g. shortly after merging the land plot inclusion (#584) -- it appears the land plots were always blank. That wasn't caught during the testing phase.

forsyth2 commented 5 months ago

Interestingly, it appears exp["annual"]["year"] includes 1860 in the post-519 code, but not in the pre-519 code.

Pre-519:

                time = v.getTime()
                exp["annual"]["year"] = [x.year for x in time.asComponentTime()]

Post-519:

                years: np.ndarray[cftime.DatetimeNoLeap] = v.coords["time"].values
                exp["annual"]["year"] = [x.year for x in years]

Even applying [:-1] to year and var for plotting (thus removing the 1860 jumps), the plots don't match up with the original ones.

forsyth2 commented 5 months ago

@tomvothecoder Is there anything about my code changes in #519 that would cause these plot differences? In particular, does xCDAT/xarray handle end-year boundaries (e.g., last year to be included) differently than cdcscan?

chengzhuzhang commented 5 months ago

@forsyth2 I think there two things we can check at this point.

Also, I'm not sure why the land plots were not showing up in the pre#519?

forsyth2 commented 5 months ago

Comparison of print-statement outputs

Notes:

Code block 1

Second to check if the time averaging algorithm behaves consistently between: v = cdutil.YEAR(v) and temporal.group_average(var, "year")

The answer is no, it does not.

Notes:

In run, we choose variables based on:

    if "net_toa_flux_restom" or "net_atm_energy_imbalance" in plots_original:
        vars_original.append("RESTOM") # Derived from `FSNTOA, FLUT`
    if "net_atm_energy_imbalance" in plots_original:
        vars_original.append("RESSURF") # Derived from `FSNS, FLNS, SHFLX, LHFLX` where `LHFLX` is derived from `QFLX, PRECC, PRECL, PRECSC, PRESCL`
    if "global_surface_air_temperature" in plots_original:
        vars_original.append("TREFHT")
    if "toa_radiation" in plots_original:
        vars_original.append("FSNTOA")
        vars_original.append("FLUT")
    if "net_atm_water_imbalance" in plots_original:
        vars_original.append("PRECC")
        vars_original.append("PRECL")
        vars_original.append("QFLX")

readTS.py globalAnnual

Pre-519 ``` # Non-derived variables # Read variable v = self.f(var) units = v.units # Annual average v = cdutil.YEAR(v) print(f"v={v}") ``` Output: `v =` 10x3 array-like object. 20 instances of `v = `.
Post-519 ``` # Non-derived variables annual_average_dataset_for_var: xarray.core.dataset.Dataset = ( self.f.temporal.group_average(var, "year") ) v = annual_average_dataset_for_var.data_vars[var] print(f"v={v}") units = v.units ``` Output: 20 instances of `v=` ``` # Deriving RESTOM # From `if "net_toa_flux_restom" or "net_atm_energy_imbalance" in plots_original:` in `run` v= v= # Deriving RESSURF # From `if "net_atm_energy_imbalance" in plots_original:` in `run` v= v= v= # Deriving LHFLX, so we can derive RESSURF v= v= v= v= v= # From `if "global_surface_air_temperature" in plots_original:` in` run` v= # And for plot 3 (toa_radiation) # From `if "toa_radiation" in plots_original:` in `run` v= v= # From ` if "net_atm_water_imbalance" in plots_original:` in `run` v= v= v= # `set_var` for land v= v= # Direct calls to `globalAnnual` from `run` v= v= ```

So, it looks like aside from ohc and volume, post-519 is creating 11x3 matrices (11 time points, 3 regions) rather than the 10x3 matrices (10 time points, 3 regions) for pre-519.

Code block 2

First if the time points used are identical pre- and post #519. For instance, print out: time = v.getTime() and v.coords["time"].values to compare.

The answer is no, they are not. Post-519 adds 1860 (as mentioned in https://github.com/E3SM-Project/zppy/issues/600#issuecomment-2163852212 above too).

coupled_global.py set_var

Pre-519 ``` if "year" not in exp["annual"]: time = v.getTime() exp["annual"]["year"] = [x.year for x in time.asComponentTime()] print(f"time={time} year={exp['annual']['year']}") ``` Output: for each region (`glb,n,s`) there is one time this code block gets called -- when we have to establish the years list for the first time. ``` time= id: time Designated a time axis. units: days since 1850-01-01 00:00:00 Length: 10 [...] year=[1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859] ```
Post-519 ``` if "year" not in exp["annual"]: years: np.ndarray[cftime.DatetimeNoLeap] = v.coords["time"].values exp["annual"]["year"] = [x.year for x in years] print(f"years={years}, year={exp['annual']['year']}") ``` Output: for each region (`glb,n,s`) there is one time this code block gets called -- when we have to establish the years list for the first time. ``` years=[cftime.DatetimeNoLeap(1850, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1851, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1852, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1853, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1854, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1855, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1856, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1857, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1858, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1859, 1, 1, 0, 0, 0, 0, has_year_zero=True) cftime.DatetimeNoLeap(1860, 1, 1, 0, 0, 0, 0, has_year_zero=True)], year=[1850, 1851, 1852, 1853, 1854, 1855,\ 1856, 1857, 1858, 1859, 1860] ```

Here, we see that 1860 has been added to the time list.

Code block 3

coupled_global.py plot

Pre-519 ``` # Pre-519 year = np.array(exp["annual"]["year"]) + exp["yoffset"] var = param_dict["var"](exp) print(f"year={year}, var={var}") ``` Output: ``` Plot 1: plot_net_toa_flux_restom year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859.], var=[-0.15330556 0.82864915 0.19361957 \ 0.67896862 -0.49533645 0.02073803 0.19587277 -0.27610729 0.15671178 0.44575011] Plot 2: plot_global_surface_air_temperature year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859.], var=[13.5650669 13.57953291 13.66500587 1\ 3.62622948 13.67629328 13.6280796 13.59520565 13.68947169 13.65041052 13.60248472] Plot 3: plot_toa_radiation year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859.], var=[241.08985228 241.59593159 241.2935824\ 241.82194034 240.96828914 241.17779574 241.31223379 241.21881379 241.31487439 241.58544433] Plot 4: plot_net_atm_energy_imbalance year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859.], var=[-0.09885803 0.15573906 -0.00288105 -\ 0.06981712 -0.05468802 0.0194263 0.15833427 -0.1465856 -0.03884378 0.1433988 ] Plot 5: plot_change_ohc year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859.], var=[0.00000000e+00 9.55389726e+21 1.76337\ 733e+22 2.39908498e+22 2.47351191e+22 2.08621620e+22 2.54745872e+22 2.40151591e+22 2.48017864e+22 2.83005968e+22] Plot 7: plot_change_sea_level year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859.], var=[0. 4.79920693 3.70354092 4.56\ 404474 4.69164396 5.28640395 0.25941659 0.27101652 0.95330325 1.00708473] [-2.86904907e-01 5.34618717e+02] Plot 8: plot_net_atm_water_imbalance year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859.], var=[-0.20374769 -0.00173622 0.14062127 \ 0.12358697 -0.75707846 0.54629009 0.19301396 -0.77012511 0.19086513 0.08845465] plot_generic for LAISHA year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859.], var=[0.86331606 1.01897241 1.01812857 1.01\ 875269 1.01816816 1.01815451 1.01734648 1.01745611 1.01776301 1.01787529 1.01692394] plot_generic failed. Invalid plot=LAISHA plot_generic for LAISUN year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859.], var=[0.12966946 0.18009152 0.17987726 0.18\ 013486 0.17994841 0.1799994 0.17990237 0.17985576 0.18004798 0.18004502 0.17992343] plot_generic failed. Invalid plot=LAISUN ```
Post-519 ``` # Post-519 year = np.array(exp["annual"]["year"]) + exp["yoffset"] var = param_dict["var"](exp) print(f"year={year}, var={var}") ``` Output: ``` Plot 1: plot_net_toa_flux_restom year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859. 1860.], var=[-0.8995651 0.93641144 0.2638\ 6664 0.56978856 -0.50036993 -0.03545963 0.31290889 -0.31860673 0.17798742 0.32307012 8.11392212] Plot 2: plot_global_surface_air_temperature year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859. 1860.], var=[13.7323308 13.55100382 13.6693\ 3694 13.62639018 13.68959493 13.62501079 13.58026332 13.70493758 13.6338664 13.61124276 12.01098022] Plot 3: plot_toa_radiation year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859. 1860.], var=[240.55406024 241.65763057 241.2\ 9078315 241.73472407 241.02632576 241.10297914 241.43819254 241.15032792 241.38243626 241.46167314 247.37867737] Plot 4: plot_net_atm_energy_imbalance year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859. 1860.], var=[-0.14161452 0.11117189 -0.0310\ 739 0.09711264 -0.00431181 -0.19357988 0.13905072 -0.03361273 -0.02872625 0.03339303 1.24085993] Plot 5: plot_change_ohc year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859. 1860.], var=[0.00000000e+00 9.55389726e+21 1\ .76337733e+22 2.39908498e+22 2.47351191e+22 2.08621620e+22 2.54745872e+22 2.40151591e+22 2.48017864e+22 2.83005968e+22] Plot 7: plot_change_sea_level year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859. 1860.], var=[0. 4.79920693 3.7035409\ 2 4.56404474 4.69164396 5.28640395 0.25941659 0.27101652 0.95330325 1.00708473] [-2.86904907e-01 5.34618717e+02] Plot 8: plot_net_atm_water_imbalance year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859. 1860.], var=[-0.32339367 0.05627497 -0.2732\ 7 0.14541643 0.39032949 -0.7924559 0.84614093 -0.55852674 -0.01044494 0.22275085 -2.12104022] plot_generic for LAISHA year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859. 1860.], var=[1.03231886 1.01795889 1.0188082\ 1 1.01825611 1.0180922 1.01731591 1.01738253 1.01799429 1.01779902 1.01698209 0.87482047] plot_generic for LAISUN year=[1850. 1851. 1852. 1853. 1854. 1855. 1856. 1857. 1858. 1859. 1860.], var=[0.1851019 0.17984825 0.1801372\ 2 0.17997803 0.17998169 0.17988021 0.17984899 0.18007913 0.18001958 0.17994643 0.12624773] ```

Comparison of variables

The diff of var for each plot is listed below:

years
Lengths differ
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Plot 1: plot_net_toa_flux_restom
Lengths differ
['-0.746', '0.108', '0.0702', '-0.109', '-0.00503', '-0.0562', '0.117', '-0.0425', '0.0213', '-0.123']

Plot 2: plot_global_surface_air_temperature
Lengths differ
['0.167', '-0.0285', '0.00433', '0.000161', '0.0133', '-0.00307', '-0.0149', '0.0155', '-0.0165', '0.00876']

Plot 3: plot_toa_radiation
Lengths differ
['-0.536', '0.0617', '-0.0028', '-0.0872', '0.058', '-0.0748', '0.126', '-0.0685', '0.0676', '-0.124']

Plot 4: plot_net_atm_energy_imbalance
Lengths differ
['-0.0428', '-0.0446', '-0.0282', '0.167', '0.0504', '-0.213', '-0.0193', '0.113', '0.0101', '-0.11']

Plot 5: plot_change_ohc
['0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0']

Plot 7: plot_change_sea_level
[0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Plot 8: plot_net_atm_water_imbalance
Lengths differ
['-0.12', '0.058', '-0.414', '0.0218', '1.15', '-1.34', '0.653', '0.212', '-0.201', '0.134']

plot_generic for LAISHA
['0.169', '-0.00101', '0.00068', '-0.000497', '-7.6e-05', '-0.000839', '3.6e-05', '0.000538', '3.6e-05', '-0.000893', '-0.142']

plot_generic for LAISUN
['0.0554', '-0.000243', '0.00026', '-0.000157', '3.33e-05', '-0.000119', '-5.34e-05', '0.000223', '-2.84e-05', '-9.86e-05', '-0.0537']

We can see that the variables are not being computed as the same values for the non-ocean plots.

These were computed using this script:

Diff script ``` def diff_arrays(name, xs, ys): print() print(name) if len(xs) != len(ys): print("Lengths differ") diffs= list(map(lambda x,y: y-x, xs, ys)) try: print(list(map(lambda d: f"{d:.3}", diffs))) except ValueError: print(diffs) def main(): # years diff_arrays( "years", [1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859], [1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860] ) diff_arrays( "Plot 1: plot_net_toa_flux_restom", [-0.15330556, 0.82864915, 0.19361957,0.67896862, -0.49533645, 0.02073803, 0.19587277, -0.27610729, 0.15671178, 0.44575011], [-0.8995651, 0.93641144, 0.26386664, 0.56978856, -0.50036993, -0.03545963, 0.31290889, -0.31860673, 0.17798742, 0.32307012, 8.11392212] ) diff_arrays( "Plot 2: plot_global_surface_air_temperature", [13.5650669, 13.57953291, 13.66500587, 13.62622948, 13.67629328, 13.6280796, 13.59520565, 13.68947169, 13.65041052, 13.60248472], [13.7323308, 13.55100382, 13.66933694, 13.62639018, 13.68959493, 13.62501079, 13.58026332, 13.70493758, 13.6338664, 13.61124276, 12.01098022] ) diff_arrays( "Plot 3: plot_toa_radiation", [241.08985228, 241.59593159, 241.2935824, 241.82194034, 240.96828914, 241.17779574, 241.31223379, 241.21881379, 241.31487439, 241.58544433], [240.55406024, 241.65763057, 241.29078315, 241.73472407, 241.02632576, 241.10297914, 241.43819254, 241.15032792, 241.38243626, 241.46167314, 247.37867737] ) diff_arrays( "Plot 4: plot_net_atm_energy_imbalance", [-0.09885803, 0.15573906, -0.00288105, -0.06981712, -0.05468802, 0.0194263, 0.15833427, -0.1465856, -0.03884378, 0.1433988], [-0.14161452, 0.11117189, -0.0310739, 0.09711264, -0.00431181, -0.19357988, 0.13905072, -0.03361273, -0.02872625, 0.03339303, 1.24085993] ) diff_arrays( "Plot 5: plot_change_ohc", [0.00000000e+00, 9.55389726e+21, 1.76337733e+22, 2.39908498e+22, 2.47351191e+22, 2.08621620e+22, 2.54745872e+22, 2.40151591e+22, 2.48017864e+22, 2.83005968e+22], [0.00000000e+00, 9.55389726e+21, 1.76337733e+22, 2.39908498e+22, 2.47351191e+22, 2.08621620e+22, 2.54745872e+22, 2.40151591e+22, 2.48017864e+22, 2.83005968e+22] ) diff_arrays( "Plot 7: plot_change_sea_level", [0, 4.79920693, 3.70354092, 4.56404474, 4.69164396, 5.28640395, 0.25941659, 0.27101652, 0.95330325, 1.00708473], [0, 4.79920693, 3.70354092, 4.56404474, 4.69164396, 5.28640395, 0.25941659, 0.27101652, 0.95330325, 1.00708473] ) diff_arrays( "Plot 8: plot_net_atm_water_imbalance", [-0.20374769, -0.00173622, 0.14062127, 0.12358697, -0.75707846, 0.54629009, 0.19301396, -0.77012511, 0.19086513, 0.08845465], [-0.32339367, 0.05627497, -0.27327, 0.14541643, 0.39032949, -0.7924559, 0.84614093, -0.55852674, -0.01044494, 0.22275085, -2.12104022] ) diff_arrays( "plot_generic for LAISHA", [0.86331606, 1.01897241, 1.01812857, 1.01875269, 1.01816816, 1.01815451, 1.01734648, 1.01745611, 1.01776301, 1.01787529, 1.01692394], [1.03231886, 1.01795889, 1.01880821, 1.01825611, 1.0180922, 1.01731591, 1.01738253, 1.01799429, 1.01779902, 1.01698209, 0.87482047] ) diff_arrays( "plot_generic for LAISUN", [0.12966946, 0.18009152, 0.17987726, 0.18013486, 0.17994841, 0.1799994, 0.17990237, 0.17985576, 0.18004798, 0.18004502, 0.17992343], [0.1851019, 0.17984825, 0.18013722, 0.17997803, 0.17998169, 0.17988021, 0.17984899, 0.18007913, 0.18001958, 0.17994643, 0.12624773] ) if __name__ == "__main__": main() ```

Why weren't the lands plots working pre-519?

Also, I'm not sure why the land plots were not showing up in the pre#519?

It looks like both of the land plots were running into dimensions issues:

x and y must have same first dimension, but have shapes (10,) and (11,)

That implies there were only 10 years but 11 data points. So, when the CDAT change made it 11 years instead of 10 years, there was suddenly a matching number of years to data points. This dimensions issue seems similar, if not identical, to the one mentioned in #571 (which is still open). This exact error is mentioned in https://github.com/E3SM-Project/zppy/pull/400#issuecomment-2048589364, which eventually led to #571 being opened.

From pre-519:

Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
    ax.plot(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
    lines = [*self._get_lines(self, *args, data=data, **kwargs)]
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
    yield from self._plot_args(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
    raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
    plot_generic(ax, xlim, exps, plot_name, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
    plot(ax, xlim, exps, param_dict, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
    raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISHA could not be plotted.
Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
    ax.plot(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
    lines = [*self._get_lines(self, *args, data=data, **kwargs)]
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
    yield from self._plot_args(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
    raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
    plot_generic(ax, xlim, exps, plot_name, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
    plot(ax, xlim, exps, param_dict, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
    raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISUN could not be plotted.
Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
    ax.plot(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
    lines = [*self._get_lines(self, *args, data=data, **kwargs)]
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
    yield from self._plot_args(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
    raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
    plot_generic(ax, xlim, exps, plot_name, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
    plot(ax, xlim, exps, param_dict, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
    raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISHA could not be plotted.
Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
    ax.plot(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
    lines = [*self._get_lines(self, *args, data=data, **kwargs)]
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
    yield from self._plot_args(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
    raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
    plot_generic(ax, xlim, exps, plot_name, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
    plot(ax, xlim, exps, param_dict, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
    raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISUN could not be plotted.
Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
    ax.plot(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
    lines = [*self._get_lines(self, *args, data=data, **kwargs)]
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
    yield from self._plot_args(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
    raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
    plot_generic(ax, xlim, exps, plot_name, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
    plot(ax, xlim, exps, param_dict, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
    raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISHA could not be plotted.
Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
    ax.plot(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
    lines = [*self._get_lines(self, *args, data=data, **kwargs)]
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
    yield from self._plot_args(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
    raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
    plot_generic(ax, xlim, exps, plot_name, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
    plot(ax, xlim, exps, param_dict, rgn)
  File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
    raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISUN could not be plotted.
forsyth2 commented 5 months ago

@chengzhuzhang See error descriptions above. How should we proceed with this?

519 changed the global time series plots to no longer match the expected results (at https://web.lcrc.anl.gov/public/e3sm/zppy_test_resources/expected_complete_run/global_time_series/global_time_series_1850-1860_results/). That means I now can't properly test other zppy changes (e.g., #599, #601, #602, #607) because I won't be able to tell if the global time series plot changes are due to #519 or something in one of those PRs.

Options:

  1. Decide the post-519 code is actually the correct way. (I.e., we want 11 time points for 10 years, we want the values produced by that averaging method). Go ahead and update the expected results for testing.
  2. Decide to go back to the pre-519 code until we can get this fixed (i.e., revert #519). If we do this, then we'll have to figure out how to properly do the CDAT migration before the CDAT migration deadline in the fall.
  3. An in-between solution? Maybe we only use the first 10 years of the post-519 code? (But then that would make the land plots not work again -- I think #571 needs to be addressed first for the land plots to be fully functional, or we test not from the beginning of the simulation here)
tomvothecoder commented 5 months ago

That implies there were only 10 years but 11 data points. So, when the CDAT change made it 11 years instead of 10 years, there was suddenly a matching number of years to data points. This dimensions issue seems similar, if not identical, to the one mentioned in https://github.com/E3SM-Project/zppy/issues/571 (which is still open). This exact error is mentioned in https://github.com/E3SM-Project/zppy/pull/400#issuecomment-2048589364, which eventually led to https://github.com/E3SM-Project/zppy/issues/571 being opened.

Possible CDAT behavior

Just a clue for debugging: you might want to watch out for cdms2 and how it subsets variables on the time dimension during the initial read (line 65 in readTS.py here).

There is a "slice flag" feature that is really confusing, but the default value is "ccn" according to page 9 of this PDF. Here's the documentation on this feature and the related issue in e3sm_diags. This feature might be removing the last time coordinate point.

What to try

You can try running the Python debugger on the pre-519 readTS.py (here) to reach line 65 (v = self.f(var)), then inspect variable values around there. You check if cdms2 is removing the last time coordinate using the default slice flag value of "ccn". If the dataset has 11 years and CDAT is removing the 11th year, then you can decide if this is a desired behavior or not.

forsyth2 commented 5 months ago

Properly configuring the debugger

For reference: to use the Unified environment in VS Code debugger:

  1. In the top bar, type/choose python>Select Interpreter
  2. Choose the environment that matches the output of which python on the command line, when run in the unified environment. E.g., /lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_login/bin/python
  3. Then, run the debugger; Python Debugger: Debug Python file

launch.json file configured as:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python Debugger: Current File",
            "type": "debugpy",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "justMyCode": false
        }
    ]
}

The last line is important for debugging into other packages.

Debugging

readTS.py: self.f = cdms2.open(filename)

That leads to:

cdms2 dataset.py: def __init__
=>
for node in list(datasetNode.getIdDict().values())

From https://github.com/CDAT/cdms/blob/3f8c7baa359f428628a666652ecf361764dc7b7a/Lib/cdmsNode.py#L468, we know:

    # Get the ID table
    def getIdDict(self):
        return self.idtable

Here, we have the following in the variable explorer:

datasetNode: <cdms2.cdmsNode.DatasetNode object>
idtable: Dictionary with variables as keys. One such key is "TS"
"TS" maps to <<cdms2.cdmsNode.VariableNode object>
Then:
node: <cdms2.cdmsNode.VariableNode object> with id: "TS"
if node.tag == 'variable': # true
if node.id in coordsaux: # false
obj = DatasetVariable(self, node.id, node)
Then:
obj: <cdms2.variable.DatasetVariable object> with id: "TS"
obj has a dtype field:
dtype: dtype('float32')
dtype has a num field:
num: 11

I'm wondering if that's the 11 values for the time series variable.

For self > variables > 'TS', shape we have (120,3) which implies 12 months x 10 years/time-points = 120. variables is initiated with grid.initDomain(self.axes, self.variables)

datasetNode > idtable > time shows length: 120 and tag: 'axis'. So, it appears the variables have 11 years/time-points of data (possibly?) and then the initDomain is fitting them to 120 months.

The filename passed to cdms2 is from the exp[exp_key], e.g., /lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519-debug/v2.LR.historical_0201/post/atm/glb/ts/monthly/5yr/glb.xml.

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519-debug/v2.LR.historical_0201/post/atm/glb/ts/monthly/5yr
$ grep 1860 glb.xml
# No results
$ grep 1859 glb.xml 
[2024-6-21 23:15:35] /lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/bin/cdscan -x glb.xml CLDHGH_185001_185412.nc CLDHGH_185501_185912.nc CLDLOW_185001_185412.nc CLDLOW_185501_185912.nc CLDMED_185001_185412.nc CLDMED_185501_185912.nc CLDTOT_185001_1854 ..."
    cdms_filemap    ="[[[CLDHGH,time_bnds],[[0,60,-,-,-,CLDHGH_185001_185412.nc],[60,120,-,-,-,CLDHGH_185501_185912.nc]]],[[CLDLOW],[[0,60,-,-,-,CLDLOW_185001_185412.nc],[60,120,-,-,-,CLDLOW_185501_185912.nc]]],[[CLDMED],[[0,60,-,-,-,CLDMED_185001_185412.nc],[60,120,-,-,-,CLDMED_185501_185912.nc]]],[[CLDTOT],[[0,60,-,-,-,CLDTOT_185001_185412.nc],[60,120,-,-,-,CLDTOT_185501_185912.nc]]],[[FLNS],[[0,60,-,-,-,FLNS_185001_185412.nc],[60,120,-,-,-,FLNS_185501_185912.nc]]],[[FLNT],[[0,60,-,-,-,FLNT_185001_185412.nc],[60,120,-,-,-,FLNT_185501_185912.nc]]],[[FLUT],[[0,60,-,-,-,FLUT_185001_185412.nc],[60,120,-,-,-,FLUT_185501_185912.nc]]],[[FSNS],[[0,60,-,-,-,FSNS_185001_185412.nc],[60,120,-,-,-,FSNS_185501_185912.nc]]],[[FSNT],[[0,60,-,-,-,FSNT_185001_185412.nc],[60,120,-,-,-,FSNT_185501_185912.nc]]],[[FSNTOA],[[0,60,-,-,-,FSNTOA_185001_185412.nc],[60,120,-,-,-,FSNTOA_185501_185912.nc]]],[[PRECC],[[0,60,-,-,-,PRECC_185001_185412.nc],[60,120,-,-,-,PRECC_185501_185912.nc]]],[[PRECL],[[0,60,-,-,-,PRECL_185001_185412.nc],[60,120,-,-,-,PRECL_185501_185912.nc]]],[[PRECSC],[[0,60,-,-,-,PRECSC_185001_185412.nc],[60,120,-,-,-,PRECSC_185501_185912.nc]]],[[PRECSL],[[0,60,-,-,-,PRECSL_185001_185412.nc],[60,120,-,-,-,PRECSL_185501_185912.nc]]],[[QFLX],[[0,60,-,-,-,QFLX_185001_185412.nc],[60,120,-,-,-,QFLX_185501_185912.nc]]],[[SHFLX],[[0,60,-,-,-,SHFLX_185001_185412.nc],[60,120,-,-,-,SHFLX_185501_185912.nc]]],[[TAUX],[[0,60,-,-,-,TAUX_185001_185412.nc],[60,120,-,-,-,TAUX_185501_185912.nc]]],[[TAUY],[[0,60,-,-,-,TAUY_185001_185412.nc],[60,120,-,-,-,TAUY_185501_18512.nc]]],[[TREFHT],[[0,60,-,-,-,TREFHT_185001_185412.nc],[60,120,-,-,-,TREFHT_185501_185912.nc]]],[[TS],[[0,60,-,-,-,TS_185001_185412.nc],[60,120,-,-,-,TS_185501_185912.nc]]],[[allaxesdummy,region_name],[[-,-,-,-,-,CLDHGH_185001_185412.nc]]]]"
$ grep length glb.xml 
        length  ="2"
        length  ="3"
        length  ="19"
        length  ="120"
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="rgn" start="0" length="3"/>
            <domElem name="rgn_len" start="0" length="19"/>
            <domElem name="time" start="0" length="120"/>
            <domElem name="nbnd" start="0" length="2"/>

I don't see anything here that would suggest 11 years of data(e.g., 121 for 1 month on the end or 132 for a full extra 12 months).

So, my conclusion is that the pre-519 (with CDAT, using cdscan) code was not secretly removing an 11th year, but rather the post-519 (without CDAT) is adding an 11th year. (The only caveat is I'm not sure where the num: 11 mentioned above is coming from.

forsyth2 commented 4 months ago

In an effort to further isolate the buggy code I wrote the following script:

Pre/Post-519 comparison script ``` import cdms2 import cdutil import xarray import xcdat # Required for `temporal` from typing import List, Dict import cftime import numpy as np # NOTE: Must have run the following before running this script: """ cd /lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519-debug/v2.LR.historical_0201/post/atm/glb/ts/monthly/5yr cdscan -x glb.xml *.nc """ # https://docs.python.org/3/library/warnings.html import sys if not sys.warnoptions: import warnings warnings.simplefilter("ignore") def diff_arrays(name, xs, ys): print() print(name) if len(xs) != len(ys): print(f"Lengths differ. {len(xs)} != {len(ys)}") try: diffs= list(map(lambda x,y: y-x, xs, ys)) try: print(list(map(lambda d: f"{d:.3}", diffs))) except ValueError: print(diffs) except Exception: print(xs) print(ys) # We're only going to look at the `net_atm_water_imbalance` plot here. variables: List[str] = ["PRECC", "PRECL", "QFLX"] # Use the same input directory for pre- and post- input_post_519: str = "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519-debug/v2.LR.historical_0201/post/atm/glb/ts/monthly/5yr" input_pre_519: str = input_post_519 + "/glb.xml" f_pre: cdms2.dataset.Dataset = cdms2.open(input_pre_519) # Lots of `FutureWarning` f_post: xarray.core.dataset.Dataset = xarray.open_mfdataset(f"{input_post_519}/*.nc") exp_pre: Dict = { "annual": {} } exp_post: Dict = { "annual": {} } for var in variables: print(f"Processing var={var}") v_pre: cdms2.tvariable.TransientVariable = f_pre(var) # `xcdat` required for `temporal` annual_average_dataset_for_var_post: xarray.core.dataset.Dataset = f_post.temporal.group_average(var, "year") #diff_arrays("Before annual averaging", v_pre, annual_average_dataset_for_var_post) # Pre -> 120x3 matrix (10 years of 12 months x glb/n/s) # Post -> rgn-3 x time-11 dataset # Pre-519 units come before averaging units_pre: str = v_pre.units v_annual_averages_pre: cdms2.tvariable.TransientVariable = cdutil.YEAR(v_pre) # Notice type change Dataset -> DataArray v_annual_averages_post: xarray.core.dataarray.DataArray = annual_average_dataset_for_var_post.data_vars[var] # Post-519 units come after averaging units_post:str = v_annual_averages_post.units #diff_arrays("After annual averaging", v_annual_averages_pre, v_annual_averages_post) # Pre -> 10x3 matrix # Post -> time-11 x rgn-3 dataset v_annual_averages_glb_pre: cdms2.tvariable.TransientVariable = v_annual_averages_pre[:, 0] v_annual_averages_glb_post: xarray.core.dataarray.DataArray = v_annual_averages_post.isel(rgn=0) exp_pre["annual"][var] = (v_annual_averages_glb_pre, units_pre) exp_post["annual"][var] = (v_annual_averages_glb_post, units_post) if "year" not in exp_pre["annual"]: time: cdms2.axis.TransientAxis = v_annual_averages_glb_pre.getTime() exp_pre["annual"]["year"]: List[int] = [x.year for x in time.asComponentTime()] if "year" not in exp_post["annual"]: years_post: np.ndarray[cftime.DatetimeNoLeap] = v_annual_averages_glb_post.coords["time"].values exp_post["annual"]["year"]: List[int] = [x.year for x in years_post] # diff_arrays("Years", exp_pre["annual"]["year"], exp_post["annual"]["year"]) # Lengths differ. 10 != 11 # [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] def find_y_values(exp: Dict) -> np.ndarray[np.float64]: return 365 * 86400 * ( np.array(exp["annual"]["QFLX"][0]) - 1e3 * ( np.array(exp["annual"]["PRECC"][0]) + np.array(exp["annual"]["PRECL"][0]) ) ) plotted_y_values_pre: np.ndarray[np.float64] = find_y_values(exp_pre) plotted_y_values_post: np.ndarray[np.float64] = find_y_values(exp_post) diff_arrays("Plotted y values", plotted_y_values_pre, plotted_y_values_post) # Lengths differ. 10 != 11 # ['-0.12', '0.058', '-0.414', '0.0218', '1.15', '-1.34', '0.653', '0.212', '-0.201', '0.134'] print("These are diffs from running all of coupled_global.py:") print(['-0.12', '0.058', '-0.414', '0.0218', '1.15', '-1.34', '0.653', '0.212', '-0.201', '0.134']) ```

We can see that the diff of the y values in the plots matches what I listed above in https://github.com/E3SM-Project/zppy/issues/600#issuecomment-2187577055. This suggests this stand-alone script is written correctly.

Aside from being stand-alone, this script also uses the same input directory for both pre- and post-519 code.

We can see that from the very beginning, the pre-519 code is working with 120 months (10 years) while the post-519 code is working with 11 years.

@tomvothecoder @chengzhuzhang This confirms for sure that cdms2.open(<single xml>) and xarray.open_mfdataset(<directory of nc files>).temporal.group_average are handling the data differently. I will try to use the debugger to go into each of these packages.

forsyth2 commented 4 months ago

Results with debugger:

pre-519

f_pre: cdms2.dataset.Dataset in the comparison script in the above comment. f_pre("PRECC") brings us to cdms2 > cudsinterface.py > cuDataset > __call__:

v = self.variables.get(id)

v is a <cdms2.variable.DatasetVariable object> with shape (120, 3). So that confirms that the pre-519 code immediately starts if with 120 months and then never makes any changes -- i.e., the data stays at 120 months (10 years/time data-points).

post-519

f_post: xarray.core.dataset.Dataset in the comparison script in the above comment. f_post.temporal.group_average("PRECC", "year") brings us to xcdat > temporal.py > TemporalAccessor > _averager:

dv_avg = self._group_average(ds, data_var)

That brings us to xcdat > temporal.py > TemporalAccessor > _group_average. The line below is where dimensions change for dv (L1218):

                dv = self._group_data(dv).sum() / self._group_data(weights).sum()

Before, dv was 120x3, but both the numerator and denominator above are 11x3. After, dv is a xarray.DataArray (year: 11, rgn: 3)

Before that line, we call the _get_weights function, which has this line:

weights: xr.DataArray = grouped_time_lengths / grouped_time_lengths.sum()

When we get to this function, grouped_time_lengths has sizes: Frozen({'time': 11}). Interestingly, the weights returned by get_weights has shape: (120,).

Following the grouped_time_lengths = self._group_data(time_lengths) call right before that line: in _group_data (L1325-1327):

            dv_gb = dv.groupby(self._labeled_time.name)

        return dv_gb

dv begins with has size: 120 but then dv_gb has sizes: Frozen({'time': 11}). It's unclear where that is coming from in groupby. I will look into this further.

forsyth2 commented 4 months ago

In GroupBy: groupers[0].size: 11 groupers[0].unique_coord: <xarray.IndexVariable 'year' (year: 11)>

It appears that in xarray > core > groupby.py > ResolvedUniqueGrouper > _factorize_unique, the line:

        unique_values, group_indices, codes_ = unique_value_groups(
            self.group_as_index, sort=sort
        )

sets group_indices such that it divides 120 months (0-119) into 11 slices -- one slice of 11 (0-10), 9 slices of 12 (11-118), 1 slice of 1 (119). This seems clearly wrong -- we just want 10 slices of 12!

So, something wrong is happening in unique_value_groups.

It looks like this block in pandas > core > algorithms.py > factorize_array:

    uniques, codes = table.factorize(
        values,
        na_sentinel=-1,
        na_value=na_value,
        mask=mask,
        ignore_na=use_na_sentinel,
    )

is setting uniques up to have size: 11. values has size: 120 when that function is called, but uniques gets returned with size: 11.

The debugger won't let me go further into table.factorize.

However, if we look deeper into values, we can see that it is already organized as an array of cftime.DatetimeNoLeap objects with 11 1850 items, 12 items for 1851-1859, and 1 item for 1860, before table.factorize is called.

So the issue lies much earlier than this -- when the values are constructed as cftime.DatetimeNoLeap objects. I will continue debugging.

forsyth2 commented 4 months ago

In xarray > core > groupby.py > ResolvedGrouper > __post_init__:

        self.group: T_Group = _resolve_group(self.obj, self.group)

Here, self.obj.year.values is the list of 120 cftime.DatetimeNoLeap objects, where self is a <xarray.core.groupby.ResolvedUniqueGrouper object>

Going further up: In xarray > core > dataarray.py > DataArray > groupby:

        rgrouper = ResolvedUniqueGrouper(UniqueGrouper(), group, self)

Here, self: <xarray.DataArray 'time_bnds' (time: 120)> and self.year.data is the list of 120 cftime.DatetimeNoLeap objects

Going further up: In xcdat > temporal.py > TemporalAccessor > _group_data, the data_var parameter gives: data_var.time.T.T.data as a list of the 120 cftime.DatetimeNoLeap objects

Going further up: In xcdat > temporal.py > TemporalAccessor > _get_weights:

        time_lengths = time_lengths.astype(np.float64)

The time_lengths variable is computed such that time_lengths.time.data is a list of the 120 cftime.DatetimeNoLeap objects

Going further up: In xcdat > temporal.py > TemporalAccessor > _group_average:

            time_bounds = ds.bounds.get_bounds("T", var_key=data_var)

time_bounds is apparently too large for the debugger's variable explorer to load (it hangs).

However, I'm now able to pull more and more code outside the xcdat/xarray package, for easier debugging:

    bounds: xcdat.bounds.BoundsAccessor = f_post.bounds
    obj: xarray.DataArray = f_post.get("PRECC")
    # `vals` is just a list of 120 3-element arrays (glb, n, s). 
    # I.e., it has NOT been sliced up yet.
    vals = obj.values
    time_bounds: xarray.DataArray = bounds.get_bounds("T", var_key="PRECC")
    time_bounds_data = time_bounds.time.data
    time_lengths: xarray.DataArray = time_bounds[:, 1] - time_bounds[:, 0]
    # We know `time_lengths_data` is a list of the 120 `cftime.DatetimeNoLeap` objects 
    # I.e., it HAS been sliced up.
    time_lengths_data = time_lengths.time.data

I now see that the slicing happens in xcdat > bounds.py > BoundsAccessor > get_bounds:

                coord = get_dim_coords(obj, axis)

I now have:

    bounds: xcdat.bounds.BoundsAccessor = f_post.bounds
    obj: xarray.DataArray = f_post.get("PRECC")
    # `unsliced_values` is just a list of 120 3-element arrays (glb, n, s). 
    unsliced_values = obj.values
    dim_coords: xarray.DataArray = obj["time"]
    # `sliced_values` is a list of the 120 `cftime.DatetimeNoLeap` objects, 
    # with 11 values for 1850 and 1 value for 1860
    sliced_values = dim_coords.values
    t: xarray.DataArray = obj.time
    sliced_values_2 = t.data

So, getting the "time" attribute of "PRECC" seems to be slicing "PRECC" incorrectly...

I'm trying to debug why that would happen. My best guess so far is something about the seasons handling winter as DJF.

chengzhuzhang commented 4 months ago

@forsyth2 thanks for creating the standalone code for debugging, I took it and further simplified. Please see below:

import cdms2
import cdutil
import xarray
import xcdat # Required for `temporal`

filepath = '/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519-debug/v2.LR.historical_0201/post/atm/glb/ts/monthly/5yr'

#cdms2 code:
var1 = cdms2.open(filepath+'/glb.xml')['TREFHT'][:,0]
#print(var1.getTime().asComponentTime())
print(cdutil.YEAR(var1))

#use xarray
var2 = xcdat.open_mfdataset(filepath+'/TREFHT*.nc')['TREFHT']

#use xcdat with center_times on to have time value recorded at the center of an time interval
var3 = xcdat.open_mfdataset(filepath+'/TREFHT*.nc',center_times=True).isel(rgn=0)
var3_ann = var3.temporal.group_average("TREFHT", "year")
print(var3_ann['TREFHT'].values)

#Diff
diff = cdutil.YEAR(var1) - var3_ann['TREFHT'].values
print(diff)

You can play with this to show why there is difference in your implement. In brief, the EAM output has time value recorded at the end of the time interval. While cdutil.average generates weights correctly based on time bounds. Reading data with the xarray/xcdat call (without center_times flag) , with group_average, operate on time values. The center_times flag brings the time value to the center of the time interval. Doing this we get exact match. I think this problem was brought up in https://github.com/E3SM-Project/zppy/pull/400.

forsyth2 commented 4 months ago

In xarray > core > combine.py > _combine_all_along_first_dim (call stack: open_mfdataset > combine_by_coords >_combine_single_variable_hypercube > _combine_nd > _combine_all_along_first_dim):

    combined_ids = dict(sorted(combined_ids.items(), key=_new_tile_id))

Before this line is called, combined_ids > (1,) > time > data > [0:60] shows the 1 1860 cftime.DatetimeNoLeap data point. (PRECC has two files of 60 data points each -> 120 total data points).

So, we can work earlier into the code now:

By the time we get to this line in xarray > core > combine.py > _infer_concat_order_from_coords (call stack: open_mfdataset > combine_by_coords >_combine_single_variable_hypercube > _infer_concat_order_from_coords):

    combined_ids = dict(zip(tile_ids, datasets))

we have indexes > 1 > values > [0:60] showing the 1 1860 cftime.DatetimeNoLeap data point

This appears even earlier in the code. By the time we get to the following line in xarray > backends > api.py > open_mfdataset:

            combined = combine_by_coords(

we have dataset > 1 > time > values > [0:60] showing the 1 1860 cftime.DatetimeNoLeap data point. This is true immediately upon datasets's realization: datasets = [open_(p, **open_kwargs) for p in paths].

forsyth2 commented 4 months ago

@chengzhuzhang Ah thank you, I will look into center_times.

forsyth2 commented 4 months ago

I think this problem was brought up in #400

@chengzhuzhang That issue was carried over to #571. Are you saying that using center_times will resolve #571?

chengzhuzhang commented 4 months ago

I think this problem was brought up in #400

@chengzhuzhang That issue was carried over to #571. Are you saying that using center_times will resolve #571?

I meant that #400 has a similar script for resolving a similar issue. It also explained the rational behind time averaging.

It is possible that with center_times it can resolve #400 in the xcdat based code. Though it needs to be tested.

tomvothecoder commented 4 months ago

You can play with this to show why there is difference in your implement. In brief, the EAM output has time value recorded at the end of the time interval. While cdutil.average generates weights correctly based on time bounds. Reading data with the xarray/xcdat call (without center_times flag) , with group_average, operate on time values. The center_times flag brings the time value to the center of the time interval. Doing this we get exact match. I think this problem was brought up in #400.

By default, xCDAT will behave the same way as CDAT by using the existing time bounds to create the temporal weights. xCDAT will infer time bounds from time values if the user specifies to add missing time bounds (e.g., xc.open_dataset("path", add_bounds=["X, "Y", "T"], center_times=True).

Basically, I think the only difference here is the CDAT is centering time coordinate automatically while xCDAT does not. Centering time coordiante with xCDAT might resolve the differences, but I'm not exactly sure.

chengzhuzhang commented 4 months ago

@tomvothecoder thank you for chiming in! After some experiment (e.g. saving time weights from xcdat), I think we can confirm that centering time should be the right solution.

Without centering time, the xcdat codes would create 11 annual mean values with the last time point 1860-01-01, that has a time weights of 1.0. In the case xCDAT implementation, since xCDAT won't center time automatically by default as CDAT, center_times=True is needed to get the correct month and weights.

forsyth2 commented 4 months ago

I think we can confirm that centering time should be the right solution.

The plots also look correct when I implement center_times in zppy too. I'm doing final testing.