Closed abby-baskind closed 2 years ago
Hey @gmacgilchrist I looked into this issue a bit more. I still don't know how to fix it but I know a bit more about what's going on.
So the issue was when we multiplied ds.fgco2
by A.areacello
and we get this area: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
.
I went through each of those models to see which was causing problems, and it's the GFDL models. Notably, I am using areacello
from a GFDL model (I think ESM4 but it doesn't really matter because only one GFDL models had areacello
. So I wrote this block of code to ignore GFDL models for now to make sure everything else works.
for name, ds in dd.items():
# maybe do mean over the entire circumference so just
# take out the where function and mean all x
if name != 'GFDL-CM4.gr.historical.Omon':
if name != 'GFDL-ESM4.gr.historical.Omon':
ar = -1*(ds.fgco2*A.areacello*conversion).mean(['x'], keep_attrs = True,skipna = True)
plt.plot(yy[10:60], ar[10:60], label = name, lw = 3)
temp[name] = ar
mn = np.nanmean(list(temp.values()), axis=0)
# sd = np.std(list(temp.values()), axis=0)
plt.plot(yy[10:60], mn[10:60], lw = 6, label = 'Ensemble mean', color = 'black')
This image does not include the GFDL models
So what's wrong with the GFDL models?
A few insights into the data
dd['GFDL-CM4.gr.historical.Omon'].fgco2.mean(['x'], keep_attrs = True,skipna = True)*conversion
has dimensions y: 180
. A.areacello.mean(['x'], keep_attrs = True,skipna = True)
also has dimensions y: 180
. However, the product of these two things has dimensions y: 162
. So if we take a peek inside the product at the y
values...
...-69.5, -68.5, -67.5, -66.5, -65.5, -64.5, -60.5, -59.5, -58.5, -57.5...
We see that's it's missing -63.5, -62.5, and -61.5. It's probably missing some other latitude values but those are the ones missing in the Southern Ocean region.
At first I thought maybe there were some NaNs that were screwing everything up, but I'm using nanmean
and tried fillna()
and they either throw the ValueError or give a result with the wrong dimensions. So maybe that's not the root issue???
To summarize, I have no idea how to fix this.
The issue is probably that some values between area and data. Can you try to assign the area as a coordinate (after dropping the actual coordinates), like this:
data = data.assign_coords(areacello=A.areacello.reset_coords(drop=True))
So I tried that, implementing it like this...
A = A.assign_coords(areacello=A.areacello.reset_coords(drop=True))
A
conversion = 3.1536e7 * 83.3
# 3.1536e7 seconds per year
# 83.3 mol C per kg -- 12 gC/mol
ax_idx = 0
for name, ds in dd.items():
ar = -1*(ds.fgco2*A.areacello*conversion).mean(['x'], keep_attrs = True,skipna = True)
ar
plt.plot(yy, ar, label = name, lw = 3)
temp[name] = ar
And got an error like this
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-36-c075a30e5a8c> in <module>
22 ar = -1*(ds.fgco2*A.areacello*conversion).mean(['x'], keep_attrs = True,skipna = True)
23 ar
---> 24 plt.plot(yy, ar, label = name, lw = 3)
25
26 temp[name] = ar
/srv/conda/envs/notebook/lib/python3.8/site-packages/matplotlib/pyplot.py in plot(scalex, scaley, data, *args, **kwargs)
2986 @_copy_docstring_and_deprecators(Axes.plot)
2987 def plot(*args, scalex=True, scaley=True, data=None, **kwargs):
-> 2988 return gca().plot(
2989 *args, scalex=scalex, scaley=scaley,
2990 **({"data": data} if data is not None else {}), **kwargs)
/srv/conda/envs/notebook/lib/python3.8/site-packages/matplotlib/axes/_axes.py in plot(self, scalex, scaley, data, *args, **kwargs)
1603 """
1604 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1605 lines = [*self._get_lines(*args, data=data, **kwargs)]
1606 for line in lines:
1607 self.add_line(line)
/srv/conda/envs/notebook/lib/python3.8/site-packages/matplotlib/axes/_base.py in __call__(self, data, *args, **kwargs)
313 this += args[0],
314 args = args[1:]
--> 315 yield from self._plot_args(this, kwargs)
316
317 def get_next_color(self):
/srv/conda/envs/notebook/lib/python3.8/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs, return_kwargs)
499
500 if x.shape[0] != y.shape[0]:
--> 501 raise ValueError(f"x and y must have same first dimension, but "
502 f"have shapes {x.shape} and {y.shape}")
503 if x.ndim > 2 or y.ndim > 2:
ValueError: x and y must have same first dimension, but have shapes (180,) and (162,)
The error this time is different but it still stems from the product of A.areacello
and fgco2
in the GFDL models having coordinates of y: 162
Ok so the problem must originate somewhere earlier. I would go back to where these areas are first loaded and then check at which point the length changes.
Yeah, the issue here is a misalignment of coordinates : xarray
will only allow variables that sit at the exact same coordinate positions to be operated on (added, multiplied, divided etc) together. So, the coordinates in the dataset A
must be slightly misaligned with that of the data for one or both of the GFDL models - this much you already discovered (great work by the way).
The GFDL models are on the gr
grid correct? Meaning that the interpolation was done "in house" before that data was uploaded to the cloud. So, it's quite possible that there are very slight differences between the y
coordinates on the lat-lon grid. For example, if the dataset from which you use areacello
has a y
coordinate of -63.5 at some point, but for some reason the GFDL model at the same point has a y
coordinate of -63.5000001, xarray
will not multiply the corresponding values together. This coordinate would then be missing from your output array - here, you're seeing that this is the case for 18 points (180-162).
A couple of questions for @jbusecke :
gr
datasets in cmip6_preprocessing
?xarray
allow you to define a "tolerance" for coordinate matching?A couple of questions / suggestions for @abby-baskind :
A
come from? You implied it was saved from a GFDL model... it would therefore be surprising that it seems to match all of the models except GFDL.assign_coords
to make sure that everything matches in the way that you want. This is usually ill-advised because xarray
s refusal to allow coordinate mismatching is one of its great strengths, but here it is pretty low risk since we know that all of the data are on a 1x1 lat-lon grid.@gmacgilchrist A
is from GFDL-ESM4.gr.historical.Ofx
. And fgco2
from both GFDL-CM4.gr.historical.Omon
and GFDL-ESM4.gr.historical.Omon
are gr
, so they came regridded. So yes, very wacky that A
does not work with exclusively the GFDL models.
One idea I just had is regridding either A
, or GFDL-CM4.gr.historical.Omon
& GFDL-ESM4.gr.historical.Omon
the same way I regridded the gn
models, since A
did work with the gn
models. As a reminder, here's how I did the regridding for gn
models...
targetgrid_ds = xe.util.grid_global(1.0, 1.0)
targetgrid_ds['lon'] = targetgrid_ds['lon']+180
targetgrid_ds['lon_b'] = targetgrid_ds['lon_b']+180
newcoords = {'x':targetgrid_ds['lon'][0,:],'y':targetgrid_ds['lat'][:,0]}
dd_regrid={}
for name,item in dd_gn.items():
regridder = xe.Regridder(item, targetgrid_ds, 'bilinear',
periodic=True, ignore_degenerate=True)
ds_regridded = regridder(item).assign_coords(**newcoords).chunk({'time':120,'lev':1})
dd_regrid[name]=ds_regridded
I haven't thought about this method too critically, so it might not be a great idea, but it also might not be a terrible idea.
I think it would be best to avoid any unnecessary regridding - both from a computational perspective and a "minimizing steps" perspective.
It is concerning and surprising that areacello
from a GFDL model seems to have different coordinates to that same GFDL model. Can you try to strip back your code to simply load fgco2
and areacello
for this model and multiply them together? Do you get the same error or coordinate reduction? If so, there is an error with the data. If not, then you are doing something to these data within your analysis that is misaligning them, and that would be worth tracking down.
It is concerning and surprising that areacello from a GFDL model seems to have different coordinates to that same GFDL model.
Concerning, yet totally not atypical 🤪
A = A.assign_coords(areacello=A.areacello.reset_coords(drop=True)) A
I think at this point the 'damage' to areacello has already been done (since it is already a coord of A
). Can you go back and check the earliest time you load areacello
?
Do you do anything to ensure alignment of coordinates for gr datasets in cmip6_preprocessing?
Yes, I have had many examples of such problems and that is why match_metrics
has some logic built in to handle these occasions. I am not sure how much hassle it is to rewrite the code to use it, but it might be worth considering (see here for examples; basically you want to load the variables and data into separate dicts at the beginning and then parse the area as an early step).
Alternatively you could use the hidden function _parse_metric, which handles all the logic under the hood, but it can be applied directly to a pair of dataset/metric.
Does xarray allow you to define a "tolerance" for coordinate matching?
I think there is something like that, but I have not used it. I usually opt to drop the coordinate labels (as suggested before).
According to @abby-baskind, this issue went away of its own accord, so we may never know the source on this occasion. My guess is some error in the IO, or at some point in the processing of A
.
@gmacgilchrist