Open joelfiddes opened 1 month ago
argfff so annoying..i I hope it will improve performance on their side though .. But sure use the path a least effort in implementation. it will add one step on our side but right now it better works than not. Eventually we may fallback on downloading netcdf.
Have they changed not only variable names but units and the type of variable? Would you have a sample file? It is super easy with xarray to change variable and dimension names on the fly.
Im actually currently using the netcdf still as in end the grib netcdf conversion required loads of extra dependencies. variable names and order of dimensions easily done I just did a stupid renaming and rewriting so as not to mess with topo_scale module. This is a preprocessing step in fetch_era5
but now i have a dtype issue - all variables are float32. and I have an einsum problem, will post here
ive pinned it down to l.100 of topo_scale
plev_interp = dw.sum(['longitude', 'latitude'], keep_attrs=True) # compute horizontal inverse weighted horizontal interpolation
Traceback (most recent call last): File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-16-f1e94b7789da>", line 2, in <module> plev_interp = dw.sum(['longitude', 'latitude'], File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/weighted.py", line 476, in sum return self._implementation( File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/weighted.py", line 543, in _implementation return self.obj.map(func, dim=dim, **kwargs) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/dataset.py", line 6026, in map variables = { File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/dataset.py", line 6027, in <dictcomp> k: maybe_wrap_array(v, func(v, *args, **kwargs)) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/weighted.py", line 274, in _weighted_sum return self._reduce(da, self.weights, dim=dim, skipna=skipna) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/weighted.py", line 229, in _reduce return dot(da, weights, dims=dim) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/computation.py", line 1762, in dot result = apply_ufunc( File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/computation.py", line 1197, in apply_ufunc return apply_dataarray_vfunc( File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/computation.py", line 304, in apply_dataarray_vfunc result_var = func(*data_vars) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/computation.py", line 761, in apply_variable_ufunc result_data = func(*input_data) File "<_array_function_ internals>", line 200, in einsum File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/numpy/core/einsumfunc.py", line 1371, in einsum return c_einsum(*operands, **kwargs) TypeError: invalid data type for einsum dw = xr.Dataset.weighted(ds_plev_pt, da_idw) dw.sum Out[18]: <bound method Weighted.sum of DatasetWeighted with weights along dimensions: latitude, longitude> dw Out[19]: DatasetWeighted with weights along dimensions: latitude, longitude ds_plev_pt.dtype Traceback (most recent call last): File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-20-0543fae50593>", line 1, in <module> ds_plev_pt.dtype File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/common.py", line 278, in _getattr_ raise AttributeError( AttributeError: 'Dataset' object has no attribute 'dtype' print(ds_plev_pt.dtype, da_idw.dtype) Traceback (most recent call last): File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-21-b2073b75da5d>", line 1, in <module> print(ds_plev_pt.dtype, da_idw.dtype) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/common.py", line 278, in _getattr_ raise AttributeError( AttributeError: 'Dataset' object has no attribute 'dtype' for var_name, var_data in ds_plev_pt.data_vars.items(): print(f"{var_name}: {var_data.dtype}") number: int64 expver: object z: float32 t: float32 u: float32 v: float32 r: float32 q: float32 print(da_idw.dtype) float64
fixed with conversion to float64:
ds_plev_pt = ds_plev_pt.astype('float64') da_idw = da_idw.astype('float64')
l.100 now runs:
dw = xr.Dataset.weighted(ds_plev_pt, da_idw)
Does it sound ok to you?
chatgpt said:
float32 and einsum ChatGPT said: ChatGPT It looks like the data type float32 is being used in your xarray.Dataset (ds_plev_pt), and it's causing issues when calling numpy.einsum. The einsum function often works better with float64, as it expects higher precision and can sometimes raise errors when dealing with lower precision types like float32.
To resolve this, you can try converting your data to float64 before passing it to einsum. Here's how you can do that:
Convert xarray.Dataset or xarray.DataArray to float64:
python Copy code ds_plev_pt = ds_plev_pt.astype('float64') da_idw = da_idw.astype('float64') Proceed with the operation: After converting both the Dataset and DataArray to float64, try running the code again to see if the error persists.
This should prevent the TypeError in einsum and allow the computation to proceed with higher precision.
will pin a full description of adaption to CDS-Beta here once finished.....
no problem converting to float64. it uses more memory, but if required by numpy, it is required... ChatGPT propose what I would have too. ;)
or it may be written as ds_plev_pt.astype(np.float64)
I think
CDS-beta code is still a branch will merge when fully tested - currently code will not work as CDS-legacy switched off
Old CDS will be fully decommissioned 26 Sept and new netcdf have a whole bunch of format/variable name changes - sheeeeet.
Seems grib is more stable and also consistent with ECMWF open data forecast I am working with.
I propose:
I have implemented a new keyword in config for output format (defaults to netcdf if none present). For now I will work in branch "cdsbeta"