Closed CapeHobbit closed 1 year ago
Hi Ben, I never had this issue before, could you just confirm the following, so I'm sure I'm going to test with the same packages: xarray, pandas version and if the exact same workflow was working with 0.8.0 and an older version of xarray
My working example has the following versions under Python 3.9:
xarray 2022.3.0 pyhd8ed1ab_0 conda-forge pandas 1.5.3 py39hecff1ad_0 conda-forge xmhw 0.8.0 py_0 coecms
The xarray and xmhw versions are "pinned" in my yaml file. If I let them float freely then my versions are: xarray 2023.2.0 pyhd8ed1ab_0 conda-forge pandas 1.5.3 py39hecff1ad_0 conda-forge xmhw 0.9.0 py_0 coecms
I haven't made any changes to the routine.
Hey could you check if you haven a singleton coordinate left that has no use? For example, I iteratate over depth levels and need to drop the depth information.
sst = temperatureData.drop("depth").TEMP
sst = sst.chunk({'time': -1, 'lat': 'auto', 'lon': 'auto'})
sst.data
with that I can avoid this error.
Will do and report back tomorrow. Thanks!
Unfortunately, I can confirm that I have no singleton dimensions, and yet the error persists. Below is the xarray dump of the array I submit to threshold;
`<xarray.DataArray 'analysed_sst' (time: 3804)> dask.array<sub, shape=(3804,), dtype=float32, chunksize=(3804,), chunktype=numpy.ndarray> Coordinates:
It has dims of ('time',)
I admit, that final comma in the dims makes me nervous, but it seems to be how xarray displays the dimension data, and not indicative of a singleton dimension.
Maybe it is related to the dimension point
that is added in the function land_check
dims = list(temp.dims)
# Add an extra fake dimensions if array 1-dimensional
# so a 'cell' dimension can still be created
dims.remove(tdim)
if len(dims) == 0:
temp = temp.expand_dims({"point": [0.0]})
dims = ["point"]
Since your error is also pointing towards a variable called point
. But I cannot really offer help to be honest.
Thanks @florianboergel - I'll dig a little deeper. I don't suppose you could add a small, known working netCDF file here, could you? It would be good to separate the toolkit from the data source before I start debugging.
It looks like the workaround to fix the coordinates issue with the new xarray dimension doesn't work for this extra dimension. Point is added if you have a "point" time series so stacking the dimensions will still work. I'll try to fix this today.
I made a new release (0.9.2) that should fix this. As it was getting annoying to fix the weird coordinates behaviour, I just introduced a check at the start of each function. If a 1-dimensional series is passed it sets a boolean variable 'point' to True and uses this to skip all the functions that handle multiple dimensions. Before it was trying to add the "point" fake dimension, so there was only one workflow independently from the number of dimensions.
Yup, works perfectly! Thank you so much.
I think the new point
variable needs to be initialized as False
, to avoid erros if the time series is not 1D. Referring to
xmhw.py line 133 @paolap
Something like
if len(dims) == 1:
point = True
else:
point = False
since later on point is evaluated:
if point:
ts = temp
else:
ts = land_check(temp, tdim=tdim, anynans=anynans)
No you're right! I forgot as I first introduced it as an extra argument and then decided it was easier to just work it out! I should add a test for this as none of them failed. I'm pushing the changes now, thanks for pointing that out!
Done a new release 0.9.3 with the correction above, thanks!
@CapeHobbit I guess the issue can be closed then ;-)
@florianboergel @paolap - yes! Sorry about that :)
Hi. Firstly, thanks for this excellent package, I am using it extensively in my Copernicus marine training.
Historically, I was using version 0.8.0 without issue. To do this, though, I had to "pin" to an older version of xarray, which is getting harder to maintain as part of a wider environment. I wanted to move up to 0.9.0 and free up xarray to iterate to a newer version, but I run into the following problem when I run the threshold routine;
This may be related to how I treat the initial data as I average it into a 1 dimensional time series before I try to run the thresholding. However, I have tried this without the spatial averaging and get different errors related to dask.
My processing is as follows;
SST_data = xr.openmfdataset(glob.glob(os.path.join(os.getcwd(),'OSTIA*.nc'))) SST_spatial_average = SST_data.mean(dim='latitude').mean(dim='longitude') SST_time_series = SST_spatial_average["analysed_sst"].load() - 273.15 clim = threshold(SST_time_series)
Are you able to offer any insight?
Cheers,
Ben
You can find my test data set here (needs to be unzipped):
OSTIA_SST.nc.zip