Open JiaweiZhuang opened 4 years ago
Major question: I think it should be kept.
Minor questions:
1.I prefer xesmf.load()
, is there any other thing that we can load?
2.I think we have to define the name before saving... But if you want set a default filename, you have to remind the users if the filename already exists.
is there any other thing that we can load?
Good point... I haven't thought of a case. One concern is that load()
can be confused with the frequently used xarray.Dataset.load()
which reads lazy data into memory. xesmf.load_regridder()
looks more similar to the naming of xarray.load_dataset()
which also reads a file from disk.
Major question: I think it should be kept.
I am thinking about catching the use_weights
and filename
kwargs, throwing an instructive error message, and maybe directing people to this issue. This seems good enough for not breaking the current user workflow?
Should the reuse_weights=True kwarg be kept for backward-compatibility?
This new API seems like a nice step forward, and the package is young enough that you shouldn't worry too much about backwards compatibility IMO. It's still 0.X after all 😄 Thanks for xESMF! It's an excellent tool.
Would it be helpful to document whether the longitudes range from -180:180 vs 0:360 in the filename like: bilinear_25x53_59x87_180.nc
If I initially have a file that ranges from -180 to 180, but then have another file with the same dimensions from 0 to 360 and I reuse weights, I think the result is inaccurate.
Also, I think it'd be nice to have an output_dir
kwarg under the save
method so that users can keep the default format, but be able to output the save in a scratch directory.
What about xesmf.open_regridder()
like xr.open_dataset
?
I prefer having a default filename since naming is hard.
regridder.save
would have to save not only the weights, but also the parameters passed to the original Regridder call, including the lat and lon coordinates, right ? This could create headaches if you want to maintain backward compatibility to files created with earlier versions.
Maybe a simpler option would be to have a weights
argument in Regridder
:
weights : None, str, Path, xarray.Dataset, scipy.sparse.coo_matrix, dict
Regridding weights, stored as a sparse coo matrix, a dictionary with keys `row_dst`, `col_src` and
`weights`, an xarray.Dataset with data variables `col`, `row` and `S`, or a path to a netCDF file
created by ESMF. If None, compute the weights.
An alternative to save
could be to_netcdf
, which would save the weights only as a file using the same naming as ESMF.
The new API (starting 0.3.0) will be:
regridder = xesmf.Regridder(grid_in, grid_out, method) regridder.save(filename) regridder_reload = xesmf.load_regridder(filename)
I encountered the same MemoryError when doing loops. I have updated xESMF to 0.3.0 but have not found these methods. Does this mean the backward-compatibility has been kept? With the latest version and the same calling, I still see the memory cannot be released timely, see my following screenshot. Did I miss something?
The API changes proposed above have not been released, nor merged into master. If you want to take a look at the branch #91 and provide feedback, that would be useful.
I'd like to implement a new regridder save-load API to resolve #11. ESMPy 8.0.0 is the prerequisite for this, and is now available on Conda conda-forge/esmpy-feedstock#26
Before implementing this, I'd like users of xESMF to review and comment on the new API, because there will be backward-incompatible changes.
Description of the changes
The current & old API (until 0.2.x) is:
The first call writes out a NetCDF file containing regridding weights, and reads the file back to memory to construct a SciPy sparse matrix (representing the regridding operation). Such extra I/O exists because ESMPy < 8.0.0 can only dump weights to disk. ESMPy 8.0.0 allows in-memory retrival of weights as numpy arrays, so such disk I/O is no longer needed.
The second call, with
reuse_weights=True
, detects that the weight file exists and simply reads it back without re-computation.filename
is an optional kwarg. If not specified, the regridder file will be given a default name likebilinear_25x53_59x87.nc
, indicating(regrid method, input grid shape, output grid shape)
.The new API (starting 0.3.0) will be:
The first call only constructs the regridder in-memory, and will not write any files to disk.
Instead of implicitly handing the save/load as part of
xesmf.Regridder()
call, the newsave()
andload_regridder()
do so explicitly. They are similar tosave()
/load_model()
in Keras orsave
/load()
in PyTorch.The original
reuse_weights
andfilename
kwargs will be removed inxesmf.Regridder()
.Major question on backward-compatibility
Should the
reuse_weights=True
kwarg be kept for backward-compatibility? My inclination is no, as there should only be one unambiguous way to load the regridder.However, one convenient thing about the old
xesmf.Regridder(..., reuse_weights=True)
API is that this single line of code works in any cases. If the weight file doesn't exist on disk, xesmf builds it and writes to disk; if exists, xesmf reads it back without re-computation.The same logic with the new API would be:
This is more verbose, but also more explicit, and explicit is better than implicit
Minor questions
xesmf.load_regridder()
or justxesmf.load()
? I prefer the former as it's more explicit.regridder.save()
has a default filename? I slightly prefer no, but some people might have a hard time choosing a name :) Can still use the current default naming scheme likebilinear_25x53_59x87.nc
. Welcome other suggestions.