ajdawson / eofs

EOF analysis in Python
http://ajdawson.github.io/eofs/
GNU General Public License v3.0
199 stars 60 forks source link

projectField not working on data with non-dimension coordinates in latest xarray version #123

Closed griverat closed 2 years ago

griverat commented 3 years ago

xarray released a few days ago the 0.19.0 version which comes with some deprecations (https://github.com/pydata/xarray/pull/5630) that seem to affect data with non-dimension coordinate only.

Sample code

import xarray as xr
from eofs.xarray import Eof

# Load example data from xarray
data = xr.tutorial.open_dataset("air_temperature").air

# Compute anomaly
anom = data.groupby("time.month") - data.groupby("time.month").mean()

# Create the Eof solver with a subset of the data
solver = Eof(anom.sel(time=slice("2013-01", "2013-12")))

# Project all the data
solver.projectField(anom, neofs=2)

This is the error raised

Traceback (most recent call last):
  File "/data/users/service/index/test.py", line 15, in <module>
    solver.projectField(anom, neofs=2)
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/eofs/xarray.py", line 639, in projectField
    pcs.coords.update({coord.name: (coord.dims, coord)
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/coordinates.py", line 163, in update
    coords, indexes = merge_coords(
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/merge.py", line 472, in merge_coords
    collected = collect_variables_and_indexes(aligned)
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/merge.py", line 294, in collect_variables_and_indexes
    variable = as_variable(variable, name=name)
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/variable.py", line 121, in as_variable
    raise TypeError(
TypeError: Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property.

Changing the last line to

solver.projectField(anom.drop("month"), neofs=2)

fixes the issue, however a non-dimension coordinate is lost, being in this case the coordinate 'month' that comes from the groupby operation from xarray.

Testing the same sample code with the previous xarray version (0.18.2) yields the expected result

<xarray.DataArray 'pseudo_pcs' (time: 2920, mode: 2)>
array([[  50.44886 ,  -78.26509 ],
       [  21.369547,  -98.04355 ],
       [   8.925724, -110.18372 ],
       ...,
       [ -47.0296  , -151.02394 ],
       [ -45.16002 , -128.5353  ],
       [ -27.55614 ,  -93.23076 ]], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
  * mode     (mode) int64 0 1
    month    (time) int64 1 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 12 12
Attributes:
    long_name:  air_pseudo_pcs

Following the suggestion in the error raised by changing this line

https://github.com/ajdawson/eofs/blob/603ed8ed86e606fcf8e69a9edc756f81544d4f93/lib/eofs/xarray.py#L638-L640

to

            # Add non-dimension coordinates.
            pcs.coords.update({coord.name: (coord.dims, coord.data)
                               for coord in time_ndcoords})

Solves the issue with no apparent breaking change. I can send a simple PR if it seems okey.

ajdawson commented 3 years ago

Hi @DangoMelon, thanks for your interest. I have not been able to maintain the xarray interface recently, if you would like to fix this with a pull request I'd be happy to review.

griverat commented 3 years ago

Thanks for the reply! I will send a PR with said change now.

rabernat commented 2 years ago

I just encountered this same error. Thanks so much @DangoMelon for making a PR to fix it!

massonseb commented 2 years ago

I also encountered this error.

mshiv commented 2 years ago

thanks for the fix @DangoMelon! I faced the same issue with the solver.eofs() and solver.pcs() functions when working with xarray 0.21.1 (details on my use case here, which seemed similar to #127)