geoschem / gcpy

Python toolkit for GEOS-Chem. Contains basic plotting scripts, plus the suite of GEOS-Chem benchmarking utilities.
https://gcpy.readthedocs.io
Other
51 stars 24 forks source link

[BUG/ISSUE] xarray open_mfdataset cannot open multiple GCHP diagnostic/restart files created with MAPL v1.0.0 #26

Closed yantosca closed 5 years ago

yantosca commented 5 years ago

Describe the bug The xarray open_mfdataset function dies with an error when trying to read more than one file created by GCHP using the new MAPL v1.0.0 (i.e. in GCHP 12.5.0 and later versions).

To Reproduce

import xarray as xr
filelist = [ '/path/to/GCHP/file1.nc', '/path/to/GCHP/file2.nc']
ds = xr.open_mfdataset(filelist)

Expected behavior The returned dataset should contain the merged data from e.g. file1.nc and file2.nc.

Screenshots Instead, this error occurs:

  File "./run_1mo_benchmark.py", line 479, in <module>
    ds = xr.open_mfdataset([gchp_vs_gchp_refspc, gchp_vs_gchp_refaod])
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/backends/api.py", line 719, in open_mfdataset
    ids=ids)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 553, in _auto_combine
    data_vars=data_vars, coords=coords)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 475, in _combine_nd
    compat=compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 493, in _auto_combine_all_along_first_dim
    data_vars, coords)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 514, in _auto_combine_1d
    merged = merge(concatenated, compat=compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 532, in merge
    variables, coord_names, dims = merge_core(dict_like_objects, compat, join)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 451, in merge_core
    variables = merge_variables(expanded, priority_vars, compat=compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 170, in merge_variables
    merged[name] = unique_variable(name, var_list, compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 90, in unique_variable
    % (name, out, var))
xarray.core.merge.MergeError: conflicting values for variable 'anchor' on objects to be combined:
first value: <xarray.Variable (nf: 6, ncontact: 4)>
dask.array<shape=(6, 4, 4), dtype=int32, chunksize=(6, 4, 4)>
Attributes:
    long_name:  anchor point
second value: <xarray.Variable (nf: 6, ncontact: 4)>
dask.array<shape=(6, 4, 4), dtype=int32, chunksize=(6, 4, 4)>
Attributes:
    long_name:  anchor point

Required information:

Additional context The problem seems to be caused by a single variable called "anchor".

lizziel commented 5 years ago

@yantosca I think this warrants opening an issue with xarray.

yantosca commented 5 years ago

I just did; https://github.com/pydata/xarray/issues/3286

yantosca commented 5 years ago

According to the xarray support team, xarray cannot gracefully handle merging files that contain variables with repeated dimension names (cf. https://github.com/pydata/xarray/issues/3286 and https://github.com/pydata/xarray/issues/1378).

The easiest solution is to exclude the offending variable (which in this case is the "anchor" variable from GCHP output using MAPL v1.0.0) in all calls to xr.open_dataset and xr.open_mfdataset. This is done by passing the "drop_variables" keyword argument to these functions.

I have pushed commit https://github.com/geoschem/gcpy/commit/5411f3d71292c3a88442eb597813d0cd8f62b44a to master, which should resolve this issue.

yantosca commented 5 years ago

Also pushed commit https://github.com/geoschem/gcpy/commit/bfe65040455aa2125552d7610364831908fc4180 to fix a typo in commit https://github.com/geoschem/gcpy/commit/5411f3d71292c3a88442eb597813d0cd8f62b44a.