MITgcm / xmitgcm

Read MITgcm mds binary files into xarray
http://xmitgcm.readthedocs.io
MIT License
57 stars 66 forks source link

error while loading llc4320 grid #102

Closed apatlpo closed 6 years ago

apatlpo commented 6 years ago

Here is the grid that I am trying to load:

mit4320/grid% ls
AngleCS.data  DYG.meta       RC.data        compressed_llc4320_to_c2880_indices.bin
AngleCS.meta  DYU.data       RC.meta        compressed_llc4320_to_c2880_indices.mat
AngleSN.data  Depth.data     RF.data        hFacC.data
AngleSN.meta  Depth.meta     RF.meta        hFacC.meta
DRC.data      IDX.data       RhoRef.data    hFacS.data
DRC.meta      LandMask.data  RhoRef.meta    hFacS.meta
DRF.data      PHrefC.data    U2zonDir.data  hFacW.data
DRF.meta      PHrefC.meta    U2zonDir.meta  hFacW.meta
DXC.data      PHrefF.data    V2zonDir.data  llc4320_c2880.m
DXC.meta      PHrefF.meta    V2zonDir.meta  llc4320_compressed_level_index.nc
DXF.data      RAC.data       XC.data        llc4320_compressed_level_index0.nc
DXG.data      RAC.meta       XC.meta        llc4320grid.ps.gz
DXG.meta      RAS.data       XG.data        mk_index.m
DXV.data      RAS.meta       XG.meta        mk_landmask.m
DYC.data      RAW.data       YC.data        read_tile_mitgrid.m
DYC.meta      RAW.meta       YC.meta        readme.txt
DYF.data      RAZ.data       YG.data        thk90.mat
DYG.data      RAZ.meta       YG.meta        with_blanks
mit4320/grid% cat XC.meta
 nDims = [   2 ];
 dimList = [
       4320,         1,      4320,
      56160,         1,     56160
 ];
 dataprec = [ 'float32' ];
 nrecords = [     1 ];

the following code (with xmitgcm 6147c27f1254042c51b7d190fcbe4e83c0ff380c):

ds = xm.open_mdsdataset('', grid_dir=grid_dir,
                            iters=None, geometry='llc', read_grid=True,
                            default_dtype=np.dtype('>f4'),
                            ignore_unknown_vars=True, )

leads to:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-d3b3200732d3> in <module>()
      2                             iters=None, geometry='llc', read_grid=True,
      3                             default_dtype=np.dtype('>f4'),
----> 4                             ignore_unknown_vars=True, )

~/xmitgcm/xmitgcm/mds_store.py in open_mdsdataset(data_dir, grid_dir, iters, prefix, read_grid, delta_t, ref_date, calendar, geometry, grid_vars_to_coords, swap_dims, endian, chunks, ignore_unknown_vars, default_dtype, nx, ny, nz, llc_method, extra_metadata)
    222                           nx=nx, ny=ny, nz=nz, llc_method=llc_method,
    223                           extra_metadata=extra_metadata)
--> 224     ds = xr.Dataset.load_store(store)
    225 
    226     if swap_dims:

~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataset.py in load_store(cls, store, decoder)
    395         if decoder:
    396             variables, attributes = decoder(variables, attributes)
--> 397         obj = cls(variables, attrs=attributes)
    398         obj._file_obj = store
    399         return obj

~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs, compat)
    363             coords = {}
    364         if data_vars is not None or coords is not None:
--> 365             self._set_init_vars_and_dims(data_vars, coords, compat)
    366         if attrs is not None:
    367             self.attrs = attrs

~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataset.py in _set_init_vars_and_dims(self, data_vars, coords, compat)
    381 
    382         variables, coord_names, dims = merge_data_and_coords(
--> 383             data_vars, coords, compat=compat)
    384 
    385         self._variables = variables

~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/merge.py in merge_data_and_coords(data, coords, compat, join)
    368     indexes = dict(extract_indexes(coords))
    369     return merge_core(objs, compat, join, explicit_coords=explicit_coords,
--> 370                       indexes=indexes)
    371 
    372 

~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
    446     assert_unique_multiindex_level_names(variables)
    447 
--> 448     dims = calculate_dimensions(variables)
    449 
    450     if explicit_coords is not None:

~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataset.py in calculate_dimensions(variables)
    107                 raise ValueError('conflicting sizes for dimension %r: '
    108                                  'length %s on %r and length %s on %r' %
--> 109                                  (dim, size, k, dims[dim], last_used[dim]))
    110     return dims
    111 

ValueError: conflicting sizes for dimension 'j': length 90 on 'XC' and length 4320 on 'j'

I have the feeling the issue from the code not entering the following statement: https://github.com/xgcm/xmitgcm/blob/6147c27f1254042c51b7d190fcbe4e83c0ff380c/xmitgcm/mds_store.py#L401-L406 because the code went first through this statement first: https://github.com/xgcm/xmitgcm/blob/6147c27f1254042c51b7d190fcbe4e83c0ff380c/xmitgcm/mds_store.py#L374-L381

what do you think?

raphaeldussin commented 6 years ago

Hi Aurelien,

Try specifying nx=4320 in the the open_dataset command. We're in a major refactoring of the code while trying to keep the old behavior and tests working, which leads to some issues sometimes. Sorry for the inconvenience.

apatlpo commented 6 years ago

Same error with nx=4320

A hack that works it to write if True: instead of the following statement: https://github.com/xgcm/xmitgcm/blob/6147c27f1254042c51b7d190fcbe4e83c0ff380c/xmitgcm/mds_store.py#L403

raphaeldussin commented 6 years ago

what about trying to pass extra_metadata :

llc4320 = xmitgcm.utils.get_extra_metadata(domain='llc', nx=4320) ds = open_mdsdataset(......, extra_metadata=llc4320)

apatlpo commented 6 years ago

This does work, thanks. Should I consider this as a workaround or as a definite solution?

raphaeldussin commented 6 years ago

ideally xmitgcm should be able to figure it out on its own without passing extra_metadata (which I had to add to support my regional llc domain). Please leave the issue open. We'll work on it but in the meantime others can use the workaround.

apatlpo commented 6 years ago

ok, thx for the quick help !

raphaeldussin commented 6 years ago

of course!

rabernat commented 6 years ago

neither nx=4320 nor extra_metadata should be needed here. Since XC.meta and RF.meta are present, xmitgcm should automatically infer the correct dimensions (see docs.) I believe this would have worked before the refactor.

rabernat commented 6 years ago

I have reproduced this bug with my llc1080 runs.

ds = xmitgcm.open_mdsdataset(ddir,
                             iters=iters, prefix=prefixes,
                             delta_t=240, ref_date='2011-01-1',
                             ignore_unknown_vars=True,
                             geometry='llc', llc_method='smallchunks')
/home/rpa/xmitgcm/xmitgcm/utils.py:423: UserWarning: Not sure what to do with rlev = L
  warnings.warn("Not sure what to do with rlev = " + rlev)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/xmitgcm/xmitgcm/mds_store.py in open_mdsdataset(data_dir, grid_dir, iters, prefix, read_grid, delta_t, ref_date, calendar, geometry, grid_vars_to_coords, swap_dims, endian, chunks, ignore_unknown_vars, default_dtype, nx, ny, nz, llc_method, extra_metadata)
    160         try:
--> 161             iternum = int(iters)
    162         # if not we probably have some kind of list

TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-3-9e59134338e5> in <module>()
      5                              delta_t=240, ref_date='2011-01-1',
      6                              ignore_unknown_vars=True,
----> 7                              geometry='llc', llc_method='smallchunks')
      8 ds

~/xmitgcm/xmitgcm/mds_store.py in open_mdsdataset(data_dir, grid_dir, iters, prefix, read_grid, delta_t, ref_date, calendar, geometry, grid_vars_to_coords, swap_dims, endian, chunks, ignore_unknown_vars, default_dtype, nx, ny, nz, llc_method, extra_metadata)
    197                 datasets = [open_mdsdataset(
    198                         data_dir, iters=iternum, read_grid=False, **kwargs)
--> 199                     for iternum in iters]
    200                 # now add the grid
    201                 if read_grid:

~/xmitgcm/xmitgcm/mds_store.py in <listcomp>(.0)
    197                 datasets = [open_mdsdataset(
    198                         data_dir, iters=iternum, read_grid=False, **kwargs)
--> 199                     for iternum in iters]
    200                 # now add the grid
    201                 if read_grid:

~/xmitgcm/xmitgcm/mds_store.py in open_mdsdataset(data_dir, grid_dir, iters, prefix, read_grid, delta_t, ref_date, calendar, geometry, grid_vars_to_coords, swap_dims, endian, chunks, ignore_unknown_vars, default_dtype, nx, ny, nz, llc_method, extra_metadata)
    222                           nx=nx, ny=ny, nz=nz, llc_method=llc_method,
    223                           extra_metadata=extra_metadata)
--> 224     ds = xr.Dataset.load_store(store)
    225 
    226     if swap_dims:

~/xarray/xarray/core/dataset.py in load_store(cls, store, decoder)
    395         if decoder:
    396             variables, attributes = decoder(variables, attributes)
--> 397         obj = cls(variables, attrs=attributes)
    398         obj._file_obj = store
    399         return obj

~/xarray/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs, compat)
    363             coords = {}
    364         if data_vars is not None or coords is not None:
--> 365             self._set_init_vars_and_dims(data_vars, coords, compat)
    366         if attrs is not None:
    367             self.attrs = attrs

~/xarray/xarray/core/dataset.py in _set_init_vars_and_dims(self, data_vars, coords, compat)
    381 
    382         variables, coord_names, dims = merge_data_and_coords(
--> 383             data_vars, coords, compat=compat)
    384 
    385         self._variables = variables

~/xarray/xarray/core/merge.py in merge_data_and_coords(data, coords, compat, join)
    363     indexes = dict(extract_indexes(coords))
    364     return merge_core(objs, compat, join, explicit_coords=explicit_coords,
--> 365                       indexes=indexes)
    366 
    367 

~/xarray/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
    441     assert_unique_multiindex_level_names(variables)
    442 
--> 443     dims = calculate_dimensions(variables)
    444 
    445     if explicit_coords is not None:

~/xarray/xarray/core/dataset.py in calculate_dimensions(variables)
    107                 raise ValueError('conflicting sizes for dimension %r: '
    108                                  'length %s on %r and length %s on %r' %
--> 109                                  (dim, size, k, dims[dim], last_used[dim]))
    110     return dims
    111 

ValueError: conflicting sizes for dimension 'j': length 90 on 'FUtave' and length 1080 on 'j'

It seems we are now assuming that all llc grids have nx=90. Our test suite did not catch this because we only test with llc90 data!

🤦‍♂️

raphaeldussin commented 6 years ago

I thought I had found a way of getting that to work for any llc but I guess not. gonna look into that. We need more test configurations (including cube sphere)

rabernat commented 6 years ago

So I have identified two distinct places where the new reading logic is flawed.

The first is in read_mds: https://github.com/xgcm/xmitgcm/blob/6147c27f1254042c51b7d190fcbe4e83c0ff380c/xmitgcm/utils.py#L234-L247

Line 247 updates metadata with the values from extra_metadata. This is overwriting the values of nx and ny that are inferred from parsing each .meta file. This is one reason why I was opposed to using extra_metadata to pass information around. We have redundant copies of the same information in multiple places, and it can become inconsistent, as is the case here. One way to resolve this would be to add a check that the dimensions inferred from the .meta files are consistent with the dimensions in extra_metadata, raising an error if this is not the case. But I consider that at best a band-aid solution. A better way forward would be to keep refactoring to eliminate the duplication of information.

Farther upstream, there is a problem in MDSStore. Here is the step that assumes that nx=90: https://github.com/xgcm/xmitgcm/blob/6147c27f1254042c51b7d190fcbe4e83c0ff380c/xmitgcm/mds_store.py#L374-L382

I can't say I understand why that is necessary.

Then, we see very similar code repeated below after nx and ny have been "discovered" from XC.meta and RF.meta.

https://github.com/xgcm/xmitgcm/blob/6147c27f1254042c51b7d190fcbe4e83c0ff380c/xmitgcm/mds_store.py#L390-L407

The problem here is that the second "/LEGACY" block will never execute, because extra_metadata already got defined in the previous "/LEGACY" block. Beyond that, I really can't follow the logic of what is happening here.

I'm really sorry that I didn't catch these during the review of #97 and #93. @apatlpo: for now I recommend that you roll back to v0.2.2, which is the release we made pre-refactoring.

rabernat commented 6 years ago

Update: I just tried deleting the first "LEGACY" block (347-382) and everything worked.

This does not change the need for a deeper fix for the issues identified above.

raphaeldussin commented 6 years ago

I put that block so that self.nface gets a value when nx is not None and llc is defined. I had to define the extra_metadata with nx=90 as nx is not known at this point. The extra_metadata is then overwritten with correct values for nx. I am surprised it works but good to know

rabernat commented 6 years ago

@raphaeldussin - is this something you might be able to look at? I think we should try to get a fix in asap, as this is a serious bug. We shouldn't wait to refactor the whole test infrastructure.

raphaeldussin commented 6 years ago

@rpa I can take a look at it tomorrow or monday.