barronh / pseudonetcdf

PseudoNetCDF like NetCDF except for many scientific format backends
GNU Lesser General Public License v3.0
77 stars 35 forks source link

uamiv: relying on element of 'VAR-LIST' to be length 16 is brittle #73

Closed mathause closed 4 years ago

mathause commented 5 years ago

(Also on https://github.com/pydata/xarray/issues/3434; https://github.com/pydata/xarray/issues/3409)

In uamiv/Memmap.py#L139 'VAR-LIST' is padded to 16 characters. In uamiv/Write.py#L189-L190 this string is np.reshaped to (-1, 16).

This fails if one of the __var_names__ is longer than 16. This causes py.test xarray/tests/test_backends.pyto fail once I fix #72.

You might need to truncate them, or use another strategy to split them:

varlist = "".join([i.ljust(16)[:16] for i in self.__var_names__])
VAR_LIST = getattr(ncffile, 'VAR-LIST')
spc_names = list(filter(None, VAR_LIST.split(' ')))

Further comment: I assume the removal of + ['TFLAG'.ljust(16)]) is intentional, see

git difftool v3.0.2 v3.1.0 -- src/PseudoNetCDF/camxfiles/lateral_boundary/Memmap.py

barronh commented 5 years ago

@mathause

The TFLAG related edit is indeed intentional.

I think the critique is specific to the format, not the reader. The uamiv format is an old unformatted fortran binary file documented in the CAMx manual. The species names are stored as ten 4-byte integers. Each integer is an ordinal corresponding to a letter. Obviously, this is not an efficient storage. Thus, the format is limited to names 10 characters or less.

The 16 limit comes from the IOAPI meta data. IOAPI requires species names be 16 characters or less and uses the VAR-LIST property. Because the uamiv format is sufficiently described by the IOAPI restrictions, I have used IOAPI meta-data on top of the uamiv formatted file to prevent creating a uamiv specific meta-data interpreter.

Because UAMIV can only have 10 character names, I think there is no action here.

Can you confirm?

mathause commented 5 years ago

I don't really know about the format - sorry for jumping to conclusions. What I tried to do, however, is to make the xarray test pass with pseudonetcdf=3.1.0.

For this I first uncommented add_cf_from_ioapi(self) (#72). This resolves the original test breakage in xarray, but introduces another failure. What I now found out that VAR-LIST is different in v3.0.2 and v3.1.0:

In v3.0.2

ds.attrs['VAR-LIST']
>>> 'O3              '

In v3.10 (with add_cf_from_ioapi(self) uncommented)

ds.attrs['VAR-LIST']
>>> 'O3              layer           level           time            time_bounds     latitude_longitudex               y               latitude        longitude       latitude_bounds longitude_bounds'

Thus in v3.1.0 _add2Varlist gets called when adding the derived data (time, latitude_longitude, ...) - some of which have names longer than 16. In v3.0.2 this does not happen.

Thus, depending on how #72 is fixed (adding the derived data again in pseudonetcdf OR changing the xarray tests) this does need an action or not.

barronh commented 5 years ago

I really appreciate your helping me look into this. I was dealing with some family stuff out of town and just haven't had time to address the v3.02 to v3.1 compatibility.

I'm going to focus on #72.