Closed forsyth2 closed 4 months ago
Looking at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_test_complete_run_www/zppy_pre_v2.5.0rc2/v2.LR.historical_0201/global_time_series/global_time_series_1850-1860_results/ (the results from before releasing v2.5.0, e.g. shortly after merging the land plot inclusion (#584) -- it appears the land plots were always blank. That wasn't caught during the testing phase.
Interestingly, it appears exp["annual"]["year"]
includes 1860 in the post-519 code, but not in the pre-519 code.
Pre-519:
time = v.getTime()
exp["annual"]["year"] = [x.year for x in time.asComponentTime()]
Post-519:
years: np.ndarray[cftime.DatetimeNoLeap] = v.coords["time"].values
exp["annual"]["year"] = [x.year for x in years]
Even applying [:-1]
to year
and var
for plotting (thus removing the 1860 jumps), the plots don't match up with the original ones.
@tomvothecoder Is there anything about my code changes in #519 that would cause these plot differences? In particular, does xCDAT/xarray handle end-year boundaries (e.g., last year to be included) differently than cdcscan
?
@forsyth2 I think there two things we can check at this point.
Also, I'm not sure why the land plots were not showing up in the pre#519?
Comparison of print-statement outputs
Notes:
coupled_global.py
run
calls coupled_global.py
set_var
, which calls readTS.py
globalAnnual
(see Code block 1). Code block 2 appears later in set_var
.coupled_global.py
run
also calls readTS.py
globalAnnual
directly.coupled_global.py
run
later calls make_plot_pdfs
, which calls a plotting function, which calls plot
(see Code block 3)Second to check if the time averaging algorithm behaves consistently between: v = cdutil.YEAR(v) and temporal.group_average(var, "year")
The answer is no, it does not.
Notes:
plots_lnd = "LAISHA,LAISUN"
set in cfgplots_original
as defined in default.ini
(all 8 default plots)In run
, we choose variables based on:
if "net_toa_flux_restom" or "net_atm_energy_imbalance" in plots_original:
vars_original.append("RESTOM") # Derived from `FSNTOA, FLUT`
if "net_atm_energy_imbalance" in plots_original:
vars_original.append("RESSURF") # Derived from `FSNS, FLNS, SHFLX, LHFLX` where `LHFLX` is derived from `QFLX, PRECC, PRECL, PRECSC, PRESCL`
if "global_surface_air_temperature" in plots_original:
vars_original.append("TREFHT")
if "toa_radiation" in plots_original:
vars_original.append("FSNTOA")
vars_original.append("FLUT")
if "net_atm_water_imbalance" in plots_original:
vars_original.append("PRECC")
vars_original.append("PRECL")
vars_original.append("QFLX")
readTS.py
globalAnnual
So, it looks like aside from ohc
and volume
, post-519 is creating 11x3 matrices (11 time points, 3 regions) rather than the 10x3 matrices (10 time points, 3 regions) for pre-519.
First if the time points used are identical pre- and post #519. For instance, print out: time = v.getTime() and v.coords["time"].values to compare.
The answer is no, they are not. Post-519 adds 1860 (as mentioned in https://github.com/E3SM-Project/zppy/issues/600#issuecomment-2163852212 above too).
coupled_global.py
set_var
Here, we see that 1860 has been added to the time list.
coupled_global.py
plot
The diff of var
for each plot is listed below:
years
Lengths differ
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Plot 1: plot_net_toa_flux_restom
Lengths differ
['-0.746', '0.108', '0.0702', '-0.109', '-0.00503', '-0.0562', '0.117', '-0.0425', '0.0213', '-0.123']
Plot 2: plot_global_surface_air_temperature
Lengths differ
['0.167', '-0.0285', '0.00433', '0.000161', '0.0133', '-0.00307', '-0.0149', '0.0155', '-0.0165', '0.00876']
Plot 3: plot_toa_radiation
Lengths differ
['-0.536', '0.0617', '-0.0028', '-0.0872', '0.058', '-0.0748', '0.126', '-0.0685', '0.0676', '-0.124']
Plot 4: plot_net_atm_energy_imbalance
Lengths differ
['-0.0428', '-0.0446', '-0.0282', '0.167', '0.0504', '-0.213', '-0.0193', '0.113', '0.0101', '-0.11']
Plot 5: plot_change_ohc
['0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0']
Plot 7: plot_change_sea_level
[0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Plot 8: plot_net_atm_water_imbalance
Lengths differ
['-0.12', '0.058', '-0.414', '0.0218', '1.15', '-1.34', '0.653', '0.212', '-0.201', '0.134']
plot_generic for LAISHA
['0.169', '-0.00101', '0.00068', '-0.000497', '-7.6e-05', '-0.000839', '3.6e-05', '0.000538', '3.6e-05', '-0.000893', '-0.142']
plot_generic for LAISUN
['0.0554', '-0.000243', '0.00026', '-0.000157', '3.33e-05', '-0.000119', '-5.34e-05', '0.000223', '-2.84e-05', '-9.86e-05', '-0.0537']
We can see that the variables are not being computed as the same values for the non-ocean plots.
These were computed using this script:
Also, I'm not sure why the land plots were not showing up in the pre#519?
It looks like both of the land plots were running into dimensions issues:
x and y must have same first dimension, but have shapes (10,) and (11,)
That implies there were only 10 years but 11 data points. So, when the CDAT change made it 11 years instead of 10 years, there was suddenly a matching number of years to data points. This dimensions issue seems similar, if not identical, to the one mentioned in #571 (which is still open). This exact error is mentioned in https://github.com/E3SM-Project/zppy/pull/400#issuecomment-2048589364, which eventually led to #571 being opened.
From pre-519:
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
ax.plot(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
lines = [*self._get_lines(self, *args, data=data, **kwargs)]
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
yield from self._plot_args(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
plot_generic(ax, xlim, exps, plot_name, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
plot(ax, xlim, exps, param_dict, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISHA could not be plotted.
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
ax.plot(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
lines = [*self._get_lines(self, *args, data=data, **kwargs)]
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
yield from self._plot_args(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
plot_generic(ax, xlim, exps, plot_name, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
plot(ax, xlim, exps, param_dict, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISUN could not be plotted.
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
ax.plot(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
lines = [*self._get_lines(self, *args, data=data, **kwargs)]
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
yield from self._plot_args(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
plot_generic(ax, xlim, exps, plot_name, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
plot(ax, xlim, exps, param_dict, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISHA could not be plotted.
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
ax.plot(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
lines = [*self._get_lines(self, *args, data=data, **kwargs)]
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
yield from self._plot_args(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
plot_generic(ax, xlim, exps, plot_name, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
plot(ax, xlim, exps, param_dict, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISUN could not be plotted.
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
ax.plot(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
lines = [*self._get_lines(self, *args, data=data, **kwargs)]
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
yield from self._plot_args(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
plot_generic(ax, xlim, exps, plot_name, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
plot(ax, xlim, exps, param_dict, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISHA could not be plotted.
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 473, in plot
ax.plot(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
lines = [*self._get_lines(self, *args, data=data, **kwargs)]
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 303, in __call__
yield from self._plot_args(
File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (11,)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 646, in make_plot_pdfs
plot_generic(ax, xlim, exps, plot_name, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 439, in plot_generic
plot(ax, xlim, exps, param_dict, rgn)
File "/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519/v2.LR.historical_0201/post/scripts/global_time_series_1850-1860_dir/coupled_global.py", line 482, in plot
raise RuntimeError(f"{param_dict['title']} could not be plotted.")
RuntimeError: LAISUN could not be plotted.
@chengzhuzhang See error descriptions above. How should we proceed with this?
zppy
changes (e.g., #599, #601, #602, #607) because I won't be able to tell if the global time series plot changes are due to #519 or something in one of those PRs.Options:
That implies there were only 10 years but 11 data points. So, when the CDAT change made it 11 years instead of 10 years, there was suddenly a matching number of years to data points. This dimensions issue seems similar, if not identical, to the one mentioned in https://github.com/E3SM-Project/zppy/issues/571 (which is still open). This exact error is mentioned in https://github.com/E3SM-Project/zppy/pull/400#issuecomment-2048589364, which eventually led to https://github.com/E3SM-Project/zppy/issues/571 being opened.
Just a clue for debugging: you might want to watch out for cdms2
and how it subsets variables on the time dimension during the initial read (line 65 in readTS.py
here).
There is a "slice flag" feature that is really confusing, but the default value is "ccn"
according to page 9 of this PDF. Here's the documentation on this feature and the related issue in e3sm_diags. This feature might be removing the last time coordinate point.
You can try running the Python debugger on the pre-519 readTS.py
(here) to reach line 65 (v = self.f(var)
), then inspect variable values around there. You check if cdms2
is removing the last time coordinate using the default slice flag value of "ccn"
. If the dataset has 11 years and CDAT is removing the 11th year, then you can decide if this is a desired behavior or not.
For reference: to use the Unified environment in VS Code debugger:
python>Select Interpreter
which python
on the command line, when run in the unified environment. E.g., /lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_login/bin/python
Python Debugger: Debug Python file
launch.json
file configured as:
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Current File",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"justMyCode": false
}
]
}
The last line is important for debugging into other packages.
readTS.py: self.f = cdms2.open(filename)
That leads to:
cdms2 dataset.py: def __init__
=>
for node in list(datasetNode.getIdDict().values())
From https://github.com/CDAT/cdms/blob/3f8c7baa359f428628a666652ecf361764dc7b7a/Lib/cdmsNode.py#L468, we know:
# Get the ID table
def getIdDict(self):
return self.idtable
Here, we have the following in the variable explorer:
datasetNode: <cdms2.cdmsNode.DatasetNode object>
idtable: Dictionary with variables as keys. One such key is "TS"
"TS" maps to <<cdms2.cdmsNode.VariableNode object>
Then:
node: <cdms2.cdmsNode.VariableNode object> with id: "TS"
if node.tag == 'variable': # true
if node.id in coordsaux: # false
obj = DatasetVariable(self, node.id, node)
Then:
obj: <cdms2.variable.DatasetVariable object> with id: "TS"
obj has a dtype field:
dtype: dtype('float32')
dtype has a num field:
num: 11
I'm wondering if that's the 11 values for the time series variable.
For self > variables > 'TS', shape
we have (120,3)
which implies 12 months x 10 years/time-points = 120. variables
is initiated with grid.initDomain(self.axes, self.variables)
datasetNode > idtable > time
shows length: 120
and tag: 'axis'
. So, it appears the variables have 11 years/time-points of data (possibly?) and then the initDomain
is fitting them to 120 months.
The filename
passed to cdms2
is from the exp[exp_key]
, e.g., /lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519-debug/v2.LR.historical_0201/post/atm/glb/ts/monthly/5yr/glb.xml
.
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519-debug/v2.LR.historical_0201/post/atm/glb/ts/monthly/5yr
$ grep 1860 glb.xml
# No results
$ grep 1859 glb.xml
[2024-6-21 23:15:35] /lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/bin/cdscan -x glb.xml CLDHGH_185001_185412.nc CLDHGH_185501_185912.nc CLDLOW_185001_185412.nc CLDLOW_185501_185912.nc CLDMED_185001_185412.nc CLDMED_185501_185912.nc CLDTOT_185001_1854 ..."
cdms_filemap ="[[[CLDHGH,time_bnds],[[0,60,-,-,-,CLDHGH_185001_185412.nc],[60,120,-,-,-,CLDHGH_185501_185912.nc]]],[[CLDLOW],[[0,60,-,-,-,CLDLOW_185001_185412.nc],[60,120,-,-,-,CLDLOW_185501_185912.nc]]],[[CLDMED],[[0,60,-,-,-,CLDMED_185001_185412.nc],[60,120,-,-,-,CLDMED_185501_185912.nc]]],[[CLDTOT],[[0,60,-,-,-,CLDTOT_185001_185412.nc],[60,120,-,-,-,CLDTOT_185501_185912.nc]]],[[FLNS],[[0,60,-,-,-,FLNS_185001_185412.nc],[60,120,-,-,-,FLNS_185501_185912.nc]]],[[FLNT],[[0,60,-,-,-,FLNT_185001_185412.nc],[60,120,-,-,-,FLNT_185501_185912.nc]]],[[FLUT],[[0,60,-,-,-,FLUT_185001_185412.nc],[60,120,-,-,-,FLUT_185501_185912.nc]]],[[FSNS],[[0,60,-,-,-,FSNS_185001_185412.nc],[60,120,-,-,-,FSNS_185501_185912.nc]]],[[FSNT],[[0,60,-,-,-,FSNT_185001_185412.nc],[60,120,-,-,-,FSNT_185501_185912.nc]]],[[FSNTOA],[[0,60,-,-,-,FSNTOA_185001_185412.nc],[60,120,-,-,-,FSNTOA_185501_185912.nc]]],[[PRECC],[[0,60,-,-,-,PRECC_185001_185412.nc],[60,120,-,-,-,PRECC_185501_185912.nc]]],[[PRECL],[[0,60,-,-,-,PRECL_185001_185412.nc],[60,120,-,-,-,PRECL_185501_185912.nc]]],[[PRECSC],[[0,60,-,-,-,PRECSC_185001_185412.nc],[60,120,-,-,-,PRECSC_185501_185912.nc]]],[[PRECSL],[[0,60,-,-,-,PRECSL_185001_185412.nc],[60,120,-,-,-,PRECSL_185501_185912.nc]]],[[QFLX],[[0,60,-,-,-,QFLX_185001_185412.nc],[60,120,-,-,-,QFLX_185501_185912.nc]]],[[SHFLX],[[0,60,-,-,-,SHFLX_185001_185412.nc],[60,120,-,-,-,SHFLX_185501_185912.nc]]],[[TAUX],[[0,60,-,-,-,TAUX_185001_185412.nc],[60,120,-,-,-,TAUX_185501_185912.nc]]],[[TAUY],[[0,60,-,-,-,TAUY_185001_185412.nc],[60,120,-,-,-,TAUY_185501_18512.nc]]],[[TREFHT],[[0,60,-,-,-,TREFHT_185001_185412.nc],[60,120,-,-,-,TREFHT_185501_185912.nc]]],[[TS],[[0,60,-,-,-,TS_185001_185412.nc],[60,120,-,-,-,TS_185501_185912.nc]]],[[allaxesdummy,region_name],[[-,-,-,-,-,CLDHGH_185001_185412.nc]]]]"
$ grep length glb.xml
length ="2"
length ="3"
length ="19"
length ="120"
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="time" start="0" length="120"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="rgn" start="0" length="3"/>
<domElem name="rgn_len" start="0" length="19"/>
<domElem name="time" start="0" length="120"/>
<domElem name="nbnd" start="0" length="2"/>
I don't see anything here that would suggest 11 years of data(e.g., 121 for 1 month on the end or 132 for a full extra 12 months).
So, my conclusion is that the pre-519 (with CDAT, using cdscan
) code was not secretly removing an 11th year, but rather the post-519 (without CDAT) is adding an 11th year. (The only caveat is I'm not sure where the num: 11
mentioned above is coming from.
In an effort to further isolate the buggy code I wrote the following script:
We can see that the diff of the y values in the plots matches what I listed above in https://github.com/E3SM-Project/zppy/issues/600#issuecomment-2187577055. This suggests this stand-alone script is written correctly.
Aside from being stand-alone, this script also uses the same input directory for both pre- and post-519 code.
We can see that from the very beginning, the pre-519 code is working with 120 months (10 years) while the post-519 code is working with 11 years.
@tomvothecoder @chengzhuzhang This confirms for sure that cdms2.open(<single xml>)
and xarray.open_mfdataset(<directory of nc files>).temporal.group_average
are handling the data differently. I will try to use the debugger to go into each of these packages.
Results with debugger:
f_pre: cdms2.dataset.Dataset
in the comparison script in the above comment.
f_pre("PRECC")
brings us to cdms2 > cudsinterface.py > cuDataset > __call__
:
v = self.variables.get(id)
v
is a <cdms2.variable.DatasetVariable object>
with shape (120, 3)
. So that confirms that the pre-519 code immediately starts if with 120 months and then never makes any changes -- i.e., the data stays at 120 months (10 years/time data-points).
f_post: xarray.core.dataset.Dataset
in the comparison script in the above comment.
f_post.temporal.group_average("PRECC", "year")
brings us to xcdat > temporal.py > TemporalAccessor > _averager
:
dv_avg = self._group_average(ds, data_var)
That brings us to xcdat > temporal.py > TemporalAccessor > _group_average
. The line below is where dimensions change for dv
(L1218):
dv = self._group_data(dv).sum() / self._group_data(weights).sum()
Before, dv
was 120x3, but both the numerator and denominator above are 11x3. After, dv
is a xarray.DataArray (year: 11, rgn: 3)
Before that line, we call the _get_weights
function, which has this line:
weights: xr.DataArray = grouped_time_lengths / grouped_time_lengths.sum()
When we get to this function, grouped_time_lengths
has sizes: Frozen({'time': 11})
. Interestingly, the weights
returned by get_weights
has shape: (120,)
.
Following the grouped_time_lengths = self._group_data(time_lengths)
call right before that line: in _group_data
(L1325-1327):
dv_gb = dv.groupby(self._labeled_time.name)
return dv_gb
dv
begins with has size: 120
but then dv_gb
has sizes: Frozen({'time': 11})
. It's unclear where that is coming from in groupby
. I will look into this further.
In GroupBy
:
groupers[0].size: 11
groupers[0].unique_coord: <xarray.IndexVariable 'year' (year: 11)>
It appears that in xarray > core > groupby.py > ResolvedUniqueGrouper > _factorize_unique
, the line:
unique_values, group_indices, codes_ = unique_value_groups(
self.group_as_index, sort=sort
)
sets group_indices
such that it divides 120 months (0-119) into 11 slices -- one slice of 11 (0-10), 9 slices of 12 (11-118), 1 slice of 1 (119). This seems clearly wrong -- we just want 10 slices of 12!
So, something wrong is happening in unique_value_groups
.
It looks like this block in pandas > core > algorithms.py > factorize_array
:
uniques, codes = table.factorize(
values,
na_sentinel=-1,
na_value=na_value,
mask=mask,
ignore_na=use_na_sentinel,
)
is setting uniques
up to have size: 11
. values
has size: 120
when that function is called, but uniques
gets returned with size: 11
.
The debugger won't let me go further into table.factorize.
However, if we look deeper into values
, we can see that it is already organized as an array of cftime.DatetimeNoLeap
objects with 11 1850 items, 12 items for 1851-1859, and 1 item for 1860, before table.factorize
is called.
So the issue lies much earlier than this -- when the values
are constructed as cftime.DatetimeNoLeap
objects. I will continue debugging.
In xarray > core > groupby.py > ResolvedGrouper > __post_init__
:
self.group: T_Group = _resolve_group(self.obj, self.group)
Here, self.obj.year.values
is the list of 120 cftime.DatetimeNoLeap
objects, where self
is a <xarray.core.groupby.ResolvedUniqueGrouper object>
Going further up:
In xarray > core > dataarray.py > DataArray > groupby
:
rgrouper = ResolvedUniqueGrouper(UniqueGrouper(), group, self)
Here, self: <xarray.DataArray 'time_bnds' (time: 120)>
and self.year.data
is the list of 120 cftime.DatetimeNoLeap
objects
Going further up:
In xcdat > temporal.py > TemporalAccessor > _group_data
, the data_var
parameter gives: data_var.time.T.T.data
as a list of the 120 cftime.DatetimeNoLeap
objects
Going further up:
In xcdat > temporal.py > TemporalAccessor > _get_weights
:
time_lengths = time_lengths.astype(np.float64)
The time_lengths
variable is computed such that time_lengths.time.data
is a list of the 120 cftime.DatetimeNoLeap
objects
Going further up:
In xcdat > temporal.py > TemporalAccessor > _group_average
:
time_bounds = ds.bounds.get_bounds("T", var_key=data_var)
time_bounds
is apparently too large for the debugger's variable explorer to load (it hangs).
However, I'm now able to pull more and more code outside the xcdat/xarray package, for easier debugging:
bounds: xcdat.bounds.BoundsAccessor = f_post.bounds
obj: xarray.DataArray = f_post.get("PRECC")
# `vals` is just a list of 120 3-element arrays (glb, n, s).
# I.e., it has NOT been sliced up yet.
vals = obj.values
time_bounds: xarray.DataArray = bounds.get_bounds("T", var_key="PRECC")
time_bounds_data = time_bounds.time.data
time_lengths: xarray.DataArray = time_bounds[:, 1] - time_bounds[:, 0]
# We know `time_lengths_data` is a list of the 120 `cftime.DatetimeNoLeap` objects
# I.e., it HAS been sliced up.
time_lengths_data = time_lengths.time.data
I now see that the slicing happens in xcdat > bounds.py > BoundsAccessor > get_bounds
:
coord = get_dim_coords(obj, axis)
I now have:
bounds: xcdat.bounds.BoundsAccessor = f_post.bounds
obj: xarray.DataArray = f_post.get("PRECC")
# `unsliced_values` is just a list of 120 3-element arrays (glb, n, s).
unsliced_values = obj.values
dim_coords: xarray.DataArray = obj["time"]
# `sliced_values` is a list of the 120 `cftime.DatetimeNoLeap` objects,
# with 11 values for 1850 and 1 value for 1860
sliced_values = dim_coords.values
t: xarray.DataArray = obj.time
sliced_values_2 = t.data
So, getting the "time" attribute of "PRECC" seems to be slicing "PRECC" incorrectly...
I'm trying to debug why that would happen. My best guess so far is something about the seasons handling winter as DJF
.
@forsyth2 thanks for creating the standalone code for debugging, I took it and further simplified. Please see below:
import cdms2
import cdutil
import xarray
import xcdat # Required for `temporal`
filepath = '/lcrc/group/e3sm/ac.forsyth2/zppy_test_debug_output/issue-600-pre-519-debug/v2.LR.historical_0201/post/atm/glb/ts/monthly/5yr'
#cdms2 code:
var1 = cdms2.open(filepath+'/glb.xml')['TREFHT'][:,0]
#print(var1.getTime().asComponentTime())
print(cdutil.YEAR(var1))
#use xarray
var2 = xcdat.open_mfdataset(filepath+'/TREFHT*.nc')['TREFHT']
#use xcdat with center_times on to have time value recorded at the center of an time interval
var3 = xcdat.open_mfdataset(filepath+'/TREFHT*.nc',center_times=True).isel(rgn=0)
var3_ann = var3.temporal.group_average("TREFHT", "year")
print(var3_ann['TREFHT'].values)
#Diff
diff = cdutil.YEAR(var1) - var3_ann['TREFHT'].values
print(diff)
You can play with this to show why there is difference in your implement. In brief, the EAM output has time value recorded at the end of the time interval. While cdutil.average
generates weights correctly based on time bounds. Reading data with the xarray/xcdat call (without center_times
flag) , with group_average, operate on time values. The center_times
flag brings the time value to the center of the time interval. Doing this we get exact match. I think this problem was brought up in https://github.com/E3SM-Project/zppy/pull/400.
In xarray > core > combine.py > _combine_all_along_first_dim
(call stack: open_mfdataset > combine_by_coords >_combine_single_variable_hypercube > _combine_nd > _combine_all_along_first_dim
):
combined_ids = dict(sorted(combined_ids.items(), key=_new_tile_id))
Before this line is called, combined_ids > (1,) > time > data > [0:60]
shows the 1 1860 cftime.DatetimeNoLeap
data point. (PRECC has two files of 60 data points each -> 120 total data points).
So, we can work earlier into the code now:
By the time we get to this line in xarray > core > combine.py > _infer_concat_order_from_coords
(call stack: open_mfdataset > combine_by_coords >_combine_single_variable_hypercube > _infer_concat_order_from_coords
):
combined_ids = dict(zip(tile_ids, datasets))
we have indexes > 1 > values > [0:60]
showing the 1 1860 cftime.DatetimeNoLeap
data point
This appears even earlier in the code. By the time we get to the following line in xarray > backends > api.py > open_mfdataset
:
combined = combine_by_coords(
we have dataset > 1 > time > values > [0:60]
showing the 1 1860 cftime.DatetimeNoLeap
data point. This is true immediately upon datasets
's realization: datasets = [open_(p, **open_kwargs) for p in paths]
.
@chengzhuzhang Ah thank you, I will look into center_times
.
I think this problem was brought up in #400
@chengzhuzhang That issue was carried over to #571. Are you saying that using center_times
will resolve #571?
I think this problem was brought up in #400
@chengzhuzhang That issue was carried over to #571. Are you saying that using
center_times
will resolve #571?
I meant that #400 has a similar script for resolving a similar issue. It also explained the rational behind time averaging.
It is possible that with center_times
it can resolve #400 in the xcdat based code. Though it needs to be tested.
You can play with this to show why there is difference in your implement. In brief, the EAM output has time value recorded at the end of the time interval. While
cdutil.average
generates weights correctly based on time bounds. Reading data with the xarray/xcdat call (withoutcenter_times
flag) , with group_average, operate on time values. Thecenter_times
flag brings the time value to the center of the time interval. Doing this we get exact match. I think this problem was brought up in #400.
By default, xCDAT will behave the same way as CDAT by using the existing time bounds to create the temporal weights.
xCDAT will infer time bounds from time values if the user specifies to add missing time bounds (e.g., xc.open_dataset("path", add_bounds=["X, "Y", "T"], center_times=True
).
Basically, I think the only difference here is the CDAT is centering time coordinate automatically while xCDAT does not. Centering time coordiante with xCDAT might resolve the differences, but I'm not exactly sure.
@tomvothecoder thank you for chiming in! After some experiment (e.g. saving time weights from xcdat), I think we can confirm that centering time should be the right solution.
Without centering time, the xcdat codes would create 11 annual mean values with the last time point 1860-01-01, that has a time weights of 1.0. In the case xCDAT implementation, since xCDAT won't center time automatically by default as CDAT, center_times=True is needed to get the correct month and weights.
I think we can confirm that centering time should be the right solution.
The plots also look correct when I implement center_times
in zppy
too. I'm doing final testing.
Request criteria
Issue description
It appears the merging of #519 altered the appearance of the global time series plots. I ran the complete-run test on
main
and it showed differences for global time series.Before merging (these plots also match the expected plots in the complete-run test): https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_test_complete_run_www/test-2.5.0/v2.LR.historical_0201/global_time_series/global_time_series_1850-1860_results/
After merging: https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_test_complete_run_www/test-main-20240611/v2.LR.historical_0201/global_time_series/global_time_series_1850-1860_results/
Original plots: the plots seem similar but definitely not identical. The large jump at the end of the post-519 graphs make me think #519 broke something rather than fixed something.
Land plots: There must have been some mistake in updating the expected Land plots prior to releasing 2.5.0 -- the expected plots show nothing plotted.... The post-519 plots actually have results. That said, these results are clearly incorrect, considering they're for the most part just horizontal lines.