fmaussion / salem

Add geolocalised subsetting, masking, and plotting operations to xarray
http://salem.readthedocs.io
Other
184 stars 43 forks source link

[CANCELLED] Deprecate WRF diagnostic variables #140

Open fmaussion opened 5 years ago

fmaussion commented 5 years ago

It hurts me to do this because I spend a lot of time in designing an OOP API that allows to compute diagnostic variables out of WRF files which is quite elegant.

However, it's unlikely that we will ever have the time to support all possible diagnostic variables that wrf-python supports. Furthermore, wrf-python is faster because it relies on underlying fortran routines.

Therefore, we will start to deprecate WRF's diagnostic and vertical interpolation tools starting with salem v0.3.0 and focus our efforts on the salem - xarray bindings and the plotting capabilities.

mullenkamp commented 3 years ago

Hi,

I just discovered this package and was really glad to see that salem uses xarray/dask and assocaited syntax as close as reasonably possible. I was also glad to initially see that you were implementing diagnostic variables. I've checked out wrf-python as well, and although they use a lot of xarray under the hood, they don't use dask with xarray. You state that wrf-python should be faster due to the underlying fortran routines, I've found that not only is wrf-python substantially slower when extracting single variables, it uses a rediculous about of RAM. Unless I'm missing something obvious.

Does your above statement mean that you will be getting rid of those diagnostic variables you already have or you will not be adding anymore?

Thanks!

fmaussion commented 3 years ago

@mullenkamp thanks.

No, I don't plan to delete that code anymore. Salem has been on a low-key "maintenance mode" since a couple of years now and is still actively used and updated (for bugs, rarely for new features).

I am however unable to add more diagnostic variables because of lack of time. To be honest, I think that there is a great potential in some of the ideas I developed back in the days (it's a bit hacky but also elegant in a way).

At core, salem "tricks" xarray into believing that there are more variables in the netcdf files than there really is, by patching the netcdf objects before passing them to xarray: https://github.com/fmaussion/salem/blob/master/salem/sio.py#L969-L984 - it's really hacky but has proven quite stable until now. Adding new variables is relatively trivial. It's testing + maintenance that costs time.

Another idea worth exploring is to use dask delayed and add the lazy diagnostic variables to the xarray dataset after creation from the netcdf file, but I'm still unsure if that really works and is performant enough.

mullenkamp commented 3 years ago

Yeah, I totally understand with a lack of time to further develop it. I very much appreciate what you've already created and has given me ideas on how to handle these "diagnostic variables". It may seem "hacky" to add in these virtual variables, but it seems to work quite well with xarray and dask. As I mentioned, since I've tested the two options by salem and wrf-python, the salem solution seems optimal without having to reinvent anything substantial. Thanks again for all the ideas! I'll put them to good use in New Zealand ;)

sunt05 commented 3 years ago

Please don't simply deprecate this feature: I find it very useful and salem does provide a much more elegant approach than wrf-python.

Also, I can see the related code are very nicely structured, e.g., T2C here: https://github.com/fmaussion/salem/blob/d3f2e5e340c2af36c84c82a9de6099c90fba12e8/salem/wrftools.py#L180-L193)

So it should not be very hard to implement others.

Maybe better to set up a template for others to contribute procedures for these diagnostics? Also, we may label this issue as help wanted to attract folks' attention and contributions.