JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

Proposed new regridder save-load API in 0.3.0 (not backward compatible) #75

Open JiaweiZhuang opened 4 years ago

JiaweiZhuang commented 4 years ago

I'd like to implement a new regridder save-load API to resolve #11. ESMPy 8.0.0 is the prerequisite for this, and is now available on Conda conda-forge/esmpy-feedstock#26

Before implementing this, I'd like users of xESMF to review and comment on the new API, because there will be backward-incompatible changes.

Description of the changes

The current & old API (until 0.2.x) is:

regridder = xesmf.Regridder(
    grid_in, grid_out, method, filename=filename
    )
regridder_reload = xesmf.Regridder(
    grid_in, grid_out, method, filename=filename, reuse_weights=True
    )

The first call writes out a NetCDF file containing regridding weights, and reads the file back to memory to construct a SciPy sparse matrix (representing the regridding operation). Such extra I/O exists because ESMPy < 8.0.0 can only dump weights to disk. ESMPy 8.0.0 allows in-memory retrival of weights as numpy arrays, so such disk I/O is no longer needed.

The second call, with reuse_weights=True, detects that the weight file exists and simply reads it back without re-computation.

filename is an optional kwarg. If not specified, the regridder file will be given a default name like bilinear_25x53_59x87.nc, indicating (regrid method, input grid shape, output grid shape).

The new API (starting 0.3.0) will be:

regridder = xesmf.Regridder(grid_in, grid_out, method)
regridder.save(filename)
regridder_reload = xesmf.load_regridder(filename)

The first call only constructs the regridder in-memory, and will not write any files to disk.

Instead of implicitly handing the save/load as part of xesmf.Regridder() call, the new save() and load_regridder() do so explicitly. They are similar to save()/load_model() in Keras or save/load() in PyTorch.

The original reuse_weights and filename kwargs will be removed in xesmf.Regridder().

Major question on backward-compatibility

Should the reuse_weights=True kwarg be kept for backward-compatibility? My inclination is no, as there should only be one unambiguous way to load the regridder.

However, one convenient thing about the old xesmf.Regridder(..., reuse_weights=True) API is that this single line of code works in any cases. If the weight file doesn't exist on disk, xesmf builds it and writes to disk; if exists, xesmf reads it back without re-computation.

The same logic with the new API would be:

filename = 'your-regridder-name.nc'
if os.path.exists(filename):
    regridder = xesmf.load_regridder(filename)
else:
    regridder = xesmf.Regridder(grid_in, grid_out, method)
    regridder.save(filename)

This is more verbose, but also more explicit, and explicit is better than implicit

Minor questions

zhonghua-zheng commented 4 years ago

Major question: I think it should be kept. Minor questions: 1.I prefer xesmf.load(), is there any other thing that we can load? 2.I think we have to define the name before saving... But if you want set a default filename, you have to remind the users if the filename already exists.

JiaweiZhuang commented 4 years ago

is there any other thing that we can load?

Good point... I haven't thought of a case. One concern is that load() can be confused with the frequently used xarray.Dataset.load() which reads lazy data into memory. xesmf.load_regridder() looks more similar to the naming of xarray.load_dataset() which also reads a file from disk.

Major question: I think it should be kept.

I am thinking about catching the use_weights and filename kwargs, throwing an instructive error message, and maybe directing people to this issue. This seems good enough for not breaking the current user workflow?

spencerahill commented 4 years ago

Should the reuse_weights=True kwarg be kept for backward-compatibility?

This new API seems like a nice step forward, and the package is young enough that you shouldn't worry too much about backwards compatibility IMO. It's still 0.X after all 😄 Thanks for xESMF! It's an excellent tool.

ahuang11 commented 4 years ago

Would it be helpful to document whether the longitudes range from -180:180 vs 0:360 in the filename like: bilinear_25x53_59x87_180.nc If I initially have a file that ranges from -180 to 180, but then have another file with the same dimensions from 0 to 360 and I reuse weights, I think the result is inaccurate.

Also, I think it'd be nice to have an output_dir kwarg under the save method so that users can keep the default format, but be able to output the save in a scratch directory.

What about xesmf.open_regridder() like xr.open_dataset?

I prefer having a default filename since naming is hard.

huard commented 4 years ago

regridder.save would have to save not only the weights, but also the parameters passed to the original Regridder call, including the lat and lon coordinates, right ? This could create headaches if you want to maintain backward compatibility to files created with earlier versions.

Maybe a simpler option would be to have a weights argument in Regridder :

weights : None, str, Path, xarray.Dataset, scipy.sparse.coo_matrix, dict
  Regridding weights, stored as a sparse coo matrix, a dictionary with keys `row_dst`, `col_src` and 
  `weights`, an xarray.Dataset with data variables `col`, `row` and `S`, or a path to a netCDF file 
  created by ESMF. If None, compute the weights. 

An alternative to save could be to_netcdf, which would save the weights only as a file using the same naming as ESMF.

FeiYao-Edinburgh commented 4 years ago

The new API (starting 0.3.0) will be:

regridder = xesmf.Regridder(grid_in, grid_out, method)
regridder.save(filename)
regridder_reload = xesmf.load_regridder(filename)

I encountered the same MemoryError when doing loops. I have updated xESMF to 0.3.0 but have not found these methods. Does this mean the backward-compatibility has been kept? With the latest version and the same calling, I still see the memory cannot be released timely, see my following screenshot. Did I miss something?

image

huard commented 4 years ago

The API changes proposed above have not been released, nor merged into master. If you want to take a look at the branch #91 and provide feedback, that would be useful.