barronh / pseudonetcdf

PseudoNetCDF like NetCDF except for many scientific format backends
GNU Lesser General Public License v3.0
76 stars 35 forks source link

pseudonetcdf 3.1.0 breaks xarray tests #72

Closed mathause closed 4 years ago

mathause commented 4 years ago

Running py.test xarray/tests/test_backends.py with pseudonetcdf 3.1.0 fails with a KeyError on the uamiv test file (https://github.com/pydata/xarray/issues/3434; https://github.com/pydata/xarray/issues/3409). See details for the output of the py.test:

``` python _______________________________________________________________________ TestPseudoNetCDFFormat.test_uamiv_format_read _______________________________________________________________________ self = def test_uamiv_format_read(self): """ Open a CAMx file and test data variables """ camxfile = open_example_dataset( "example.uamiv", engine="pseudonetcdf", backend_kwargs={"format": "uamiv"} ) data = np.arange(20, dtype="f").reshape(1, 1, 4, 5) expected = xr.Variable( ("TSTEP", "LAY", "ROW", "COL"), data, dict(units="ppm", long_name="O3".ljust(16), var_desc="O3".ljust(80)), ) actual = camxfile.variables["O3"] assert_allclose(expected, actual) data = np.array(["2002-06-03"], "datetime64[ns]") expected = xr.Variable( ("TSTEP",), data, dict( bounds="time_bounds", long_name=( "synthesized time coordinate " + "from SDATE, STIME, STEP " + "global attributes" ), ), ) > actual = camxfile.variables["time"] xarray/tests/test_backends.py:3416: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = Frozen({'TFLAG': array([[[2002154, 0]]], dtype=int32) Attribu...ppm long_name: O3 var_desc: O3 ...}) key = 'time' def __getitem__(self, key: K) -> V: > return self.mapping[key] E KeyError: 'time' xarray/core/utils.py:385: KeyError ______________________________________________________________________ TestPseudoNetCDFFormat.test_uamiv_format_mfread ______________________________________________________________________ self = def test_uamiv_format_mfread(self): """ Open a CAMx file and test data variables """ camxfile = open_example_mfdataset( ["example.uamiv", "example.uamiv"], engine="pseudonetcdf", concat_dim="TSTEP", backend_kwargs={"format": "uamiv"}, ) data1 = np.arange(20, dtype="f").reshape(1, 1, 4, 5) data = np.concatenate([data1] * 2, axis=0) expected = xr.Variable( ("TSTEP", "LAY", "ROW", "COL"), data, dict(units="ppm", long_name="O3".ljust(16), var_desc="O3".ljust(80)), ) actual = camxfile.variables["O3"] assert_allclose(expected, actual) data1 = np.array(["2002-06-03"], "datetime64[ns]") data = np.concatenate([data1] * 2, axis=0) attrs = dict( bounds="time_bounds", long_name=( "synthesized time coordinate " + "from SDATE, STIME, STEP " + "global attributes" ), ) expected = xr.Variable(("TSTEP",), data, attrs) > actual = camxfile.variables["time"] xarray/tests/test_backends.py:3453: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = Frozen({'TFLAG': dask.array V: > return self.mapping[key] E KeyError: 'time' xarray/core/utils.py:385: KeyError ```

I am pretty sure these lines are responsible:

https://github.com/barronh/pseudonetcdf/blob/master/src/PseudoNetCDF/camxfiles/uamiv/Memmap.py#L187-L191

they were not commented in v3.0.2, see

git difftool v3.0.2 v3.1.0 -- src/PseudoNetCDF/camxfiles/uamiv/Memmap.py

When removing the comments, the xarray tests run again*. If this change was intentional, the xarray tests have to be adapted.

*note: there is another fail in this case, see next issue

barronh commented 4 years ago

This is a backward compatibility issue for sure.

My original look at the xarray logs suggested failures in Python v2.7 due to use of non-backward-compatible mixed named arguments with **kwds. That was my first issue that I was addressing, so I hadn't gotten to this problem.

A few file formats (uamiv and ioapi-netcdf) were being "decorated" with CF meta data to supplement the format-specific meta-data. This was largely to allow for plotting and subsetting methods. This caused some problems when subclassing netcdf4 files directly. The user got warnings related to attempts to write to read-only files. To address this, I moved to making methods (plotting, subsetting) IOAPI aware. Upside was addressing warnings and keeping a clear separation between original data and meta-data.

As you note, this breaks the downstream expectation that I put in the xarray test. While this is apparent in the uamiv, it also affects the IOAPI meta-data netcdf files. Most other formats were going directly to CF, so there is not problem.

The easy answer would be to keep add_cf_from_ioapi in uamiv and add fixes in the ioapi class to _add2Varlist. This would fix CAMx files, which is good. It wouldn't address other IOAPI files, which are currently an untested feature for xarray.